Welcome to MilkyWay@home

Finally getting new tasks only seconds after running out. May not be worth the hassle.


Advanced search

Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69225 - Posted: 3 Nov 2019, 23:24:42 UTC

Ideally the project should download a few tasks on every upload but life is not fair.

The best I could do previously was to send that "UPDATE" message about 2-3 minutes after the last task was completed. That gave an average of 7 minutes of idle time unlike the 12-15 without that update. With 6 GPUs each handling 5 concurrent tasks at 55 seconds per work unit I was losing roughly 50 tasks every 2 or so hours probably 500 a day minimum using just that 7 minutes. I spent a long time looking at this and it quickly became a "challenge" even though the amount of credit was small.

I had to create a pair of clients: "slave" and "master". Both start up within seconds of one another and both exit when idle. There is actually a command to do that "boinc.exe --exit_when_idle" which was convenient. All I had to do (there was more*** of course) was to have each program send the message "allow_new_tasks" to the other and each program was in a "goto" loop The idea being the slave would introduce itself to Milkyway but not ask for data.. The master would start right in and a soon as the last work unit was crunched tell the slave it was time to start and vice-versa.

The scripts and the raw output are here
https://stateson.net/images/mw_chatter_m_s.txt

However, I pasted the important stuff below

Slave task started at:

03-Nov-2019 15:12:30 [---] Running under account jstateson...
---
---
45 minutes later it ran out of data, exited and started right back up
---
03-Nov-2019 15:57:59 [Milkyway@Home] Reporting 5 completed tasks
03-Nov-2019 15:57:59 [Milkyway@Home] Not requesting tasks: "no new tasks" requested via Manager
03-Nov-2019 15:58:00 [Milkyway@Home] Scheduler request completed
03-Nov-2019 15:58:00 [---] exiting because no more results
03-Nov-2019 15:58:00 [---] Time to exit
03-Nov-2019 15:58:00 [---] Starting BOINC client version 7.15.0 for windows_x86_64

the Message "allow new tasks" was sent by the slave to the master at the exact time of 15:58:00 as shown below
the master was started 12 seconds after the slave

03-Nov-2019 15:12:42 [---] Starting BOINC client version 7.15.0 for windows_x86_64
---
---
---
03-Nov-2019 15:12:44 Initialization completed
03-Nov-2019 15:15:07 [Milkyway@Home] project resumed by user
03-Nov-2019 15:58:00 [Milkyway@Home] work fetch resumed by user
03-Nov-2019 15:58:01 [Milkyway@Home] Sending scheduler request: To fetch work.
03-Nov-2019 15:58:01 [Milkyway@Home] Requesting new tasks for AMD/ATI GPU
03-Nov-2019 15:58:06 [Milkyway@Home] Scheduler request completed: got 900 new tasks


=================improvement============
15:57:59 the slave is out of data
15:58:06 the master got 900 tasks

About 7 seconds of idle time and I manually counted of about 8 seconds before all 6 GPUs had 5 tasks running on each one.

Anyway, if anyone wants to try this the scripts I used are listed in the above url.
***unfortunately, it was not possible to implement this without a few modifications to the boinc client using VS2013
I can put the source code changes at GitHub if anyone wants to try build the app. I can use some help with remaining debugging and some features such as buffering up unprocessed work units to survive a scheduled off line period.
ID: 69225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69238 - Posted: 9 Nov 2019, 3:59:08 UTC

That master[-slave project is on hold as I got a simpler way.

The problem I found is that one cannot upload results every time when asking for data on a fast system so I made a mod to check if the elapsed time from the last upload was greater then 256 seconds and only uploaded results when that happens. This actually worked too good as I started to get over 900 total work units so I had back off to allow more results to be uploaded. Currently, on my system, there are around 800 units at any one time which is nice if Milkyway goes offline as I can continue to crunch.

Program is here
https://github.com/JStateson/MilkywayNewWork

check out the document sample_mw_work_flow.txt

!!!!!!!!!!!!!Use at your own risk.!!!!!!!!!!!!!!!
ID: 69238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2353
Credit: 442,996,718
RAC: 374,613
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69239 - Posted: 9 Nov 2019, 12:06:44 UTC - in response to Message 69238.  

That master[-slave project is on hold as I got a simpler way.

The problem I found is that one cannot upload results every time when asking for data on a fast system so I made a mod to check if the elapsed time from the last upload was greater then 256 seconds and only uploaded results when that happens. This actually worked too good as I started to get over 900 total work units so I had back off to allow more results to be uploaded. Currently, on my system, there are around 800 units at any one time which is nice if Milkyway goes offline as I can continue to crunch.

Program is here
https://github.com/JStateson/MilkywayNewWork

check out the document sample_mw_work_flow.txt

!!!!!!!!!!!!!Use at your own risk.!!!!!!!!!!!!!!!


In your work flow file you seem to be sending more units then you get back from MW but the 'wait_interval' does seem to be working for you. Do you think a longer interval will result in the number being returned and the number being sent to you by MW will even out.
ID: 69239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69241 - Posted: 9 Nov 2019, 13:11:10 UTC - in response to Message 69239.  


In your work flow file you seem to be sending more units then you get back from MW but the 'wait_interval' does seem to be working for you. Do you think a longer interval will result in the number being returned and the number being sent to you by MW will even out.



The delay just needs to be big enough to satisfy the projects requirement of "shut up for x seconds". Looking at "sched_reply_milkyway.cs.rpi.edu_milkyway.xml" I see
<request_delay>91.000000</request_delay>

So one could go down to 92.
The 256 keeps me at a total of near 900. Been running on two machines overnight, both win, and the total tasks remain near 900. I believe the client queues them up in order of arrival (FIFO) so there should be no stale work units even if the total count never drops before about 850 -880.

Working on a Linux version.

I have only tested this with one project "milkyway" but the mod I made tests for that project and does not muck with scheduling for othe projects.

The other system I am testing this on has pair of rx560, one each rx580 and hd7950 so is slower. It has a total count of between 890 and 905 work units.
ID: 69241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2353
Credit: 442,996,718
RAC: 374,613
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69242 - Posted: 9 Nov 2019, 14:22:01 UTC - in response to Message 69241.  


In your work flow file you seem to be sending more units then you get back from MW but the 'wait_interval' does seem to be working for you. Do you think a longer interval will result in the number being returned and the number being sent to you by MW will even out.



The delay just needs to be big enough to satisfy the projects requirement of "shut up for x seconds". Looking at "sched_reply_milkyway.cs.rpi.edu_milkyway.xml" I see
<request_delay>91.000000</request_delay>

So one could go down to 92.
The 256 keeps me at a total of near 900. Been running on two machines overnight, both win, and the total tasks remain near 900. I believe the client queues them up in order of arrival (FIFO) so there should be no stale work units even if the total count never drops before about 850 -880.

Working on a Linux version.

I have only tested this with one project "milkyway" but the mod I made tests for that project and does not muck with scheduling for othe projects.

The other system I am testing this on has pair of rx560, one each rx580 and hd7950 so is slower. It has a total count of between 890 and 905 work units.


Are you sure this is a cc_config.xml file and not an app_config.xml file? When I put it in my Boinc directory it gives me error messages:

11/9/2019 9:19:22 AM | | Unrecognized tag in cc_config.xml: <mw_low_water_pct>
11/9/2019 9:19:22 AM | | Unrecognized tag in cc_config.xml: <mw_high_water_pct>
11/9/2019 9:19:22 AM | | Unrecognized tag in cc_config.xml: <mw_wait_interval>
ID: 69242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69243 - Posted: 9 Nov 2019, 14:29:18 UTC - in response to Message 69242.  
Last modified: 9 Nov 2019, 14:31:24 UTC


Are you sure this is a cc_config.xml file and not an app_config.xml file? When I put it in my Boinc directory it gives me error messages:

11/9/2019 9:19:22 AM | | Unrecognized tag in cc_config.xml: <mw_low_water_pct>
11/9/2019 9:19:22 AM | | Unrecognized tag in cc_config.xml: <mw_high_water_pct>
11/9/2019 9:19:22 AM | | Unrecognized tag in cc_config.xml: <mw_wait_interval>


the old program will give that error message as it does not know about the new variables added
You have to use the new .exe
Rename the old one to boinc_old.exe or whatever
ID: 69243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69244 - Posted: 9 Nov 2019, 17:42:11 UTC

Got the ubuntu version to work.

both win32, win64 and ubuntu executable are at

https://github.com/JStateson/MilkywayNewWork
ID: 69244 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gambatesa
Avatar

Send message
Joined: 23 Feb 18
Posts: 26
Credit: 3,541,202,278
RAC: 5,587,008
3 billion credit badge2 year member badge
Message 69265 - Posted: 19 Nov 2019, 16:43:43 UTC

I can't fully understand how it works.

Can you please detail the setup process?
Want your Kids stay off from Drugs? Get them building Crunching PC's and they'll never have enough money for drugs
ID: 69265 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69268 - Posted: 20 Nov 2019, 0:10:35 UTC - in response to Message 69265.  
Last modified: 20 Nov 2019, 0:20:35 UTC

I can't fully understand how it works.

Can you please detail the setup process?


I used VS2013 to build a new version of boinc "7.15.1" The boinc people have put together a nice package for building windows and also Linux.

I made a change to the source code of cs_schedule.cpp to delay boinc from asking the milkyway project for more data on every upload. It lets a minimum of 256 seconds go by before asking for more data. This allows one or two uploads to occur before data is requested and that seems to be what is needed to bypass the "you asked for data too soon" or the 91 second required limit.

The changes to the program are here
https://github.com/JStateson/MilkywayNewWork
all you need (ha ha) is Visual studio 2l013 or earlier and download from Berkeley the sources and windows dependencies. I can help you through the download from Berkeley but you will have to find your own VS2013 iso file. I don't recommend a torrent that pulls from any eastern block countries. No telling what "extra stuff" was included in the package. Once you download and get the original built, then download my changes and add them in.

If you don't want to do this you can downloads the boinc executables files I put there at GitHub. You will have answer a lot of "are you sure" questions as I did not buy any certificates that "bless" the download. It needs to go at program files\boinc if 64 bit or get the 32 bit one for the x86 program folder.. You might want to rename the original executable to boinc_original.exe

I delete the Linux one because it was built for my 18.04 and probably would not work on other Linux. It is actually a lot easier to build the Linux version as there is no need to hunt down a 7+ year old compiler or get all the windows dependencies.

It is not necessary to replace the cc_config.xml with the one I put at GitHub.

what the change does is about every four minutes or so it will ask for data from milkyway and will download enough to bring your total count up to 900 or whatever the limit.it.

Lemme know if a problem. I tested it with world community grid to make sure other projects are not affected. It only delays the milkyway project, no others af affected. I have no way of testing the 32bit version nor do I have a copy of xp, vista or win7 to test on so if a problem let me know.
ID: 69268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gambatesa
Avatar

Send message
Joined: 23 Feb 18
Posts: 26
Credit: 3,541,202,278
RAC: 5,587,008
3 billion credit badge2 year member badge
Message 69271 - Posted: 20 Nov 2019, 13:11:28 UTC - in response to Message 69268.  
Last modified: 20 Nov 2019, 13:12:27 UTC


I used VS2013 to build a new version of boinc "7.15.1" The boinc people have put together a nice package for building windows and also Linux.


The latest version available for download is 7.14.2

Where can i download these new binaries?
Want your Kids stay off from Drugs? Get them building Crunching PC's and they'll never have enough money for drugs
ID: 69271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69272 - Posted: 20 Nov 2019, 14:41:34 UTC - in response to Message 69271.  
Last modified: 20 Nov 2019, 15:11:42 UTC



The latest version available for download is 7.14.2

Where can i download these new binaries?


http://stateson.net/bthistory/boinc_x64_for_milkyway.zip

The following procedure assumes that your original boinc.exe is at "/Program Files/boinc"

I do not have an install procedure so it must be installed manually

Extract the boinc.exe file from the zip archive and save it at /Downloads or where convenient
It can only be executed from the program directory so trying "boinc.exe --version" will tell you files are missing

You must stop boinc from executing before replacing it.
To stop boinc, First bring up the boinc manager, then exit the boinc manager and specify to stop programs from executing

After stopping boinc you should rename the original program from boinc.exe to old_boinc.exe

Copy the new program into the /Program Files/boinc folder


Starting up the boinc manager should also start up boinc. Check to see if the version is 7.15.0 for the new program. After a few minutes of looking at the event message you should notice a download of a few files. Eventually the number of work units waiting to be processed will rise up and hover near the maximum. The only time it will drop to 0 is when the project goes off-line. On my system the count stays between 850 - 890 all the time.



I have shortcuts for starting and stopping boinc but the normal startup for boinc must be removed from the windows registry or a conflict arises. PM me if you want to do this. They are not needed to get this milkyway version to work.

Let me know if a problem and I can put together a better set of instructions.

[EDIT]
for 32 bit systems (I have no way of testing this and no longer have xp, vista, 7 or 8)
http://stateson.net/bthistory/boinc_x32_for_milkyway.zip
ID: 69272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 28 Mar 18
Posts: 14
Credit: 744,490,433
RAC: 2,258,922
500 million credit badge2 year member badge
Message 69283 - Posted: 22 Nov 2019, 5:57:18 UTC - in response to Message 69272.  

I simply just alter the coproc file so I can get the max 900 then run an update command for every 92 seconds.

For linux:
watch -n 92 boinccmd --project http://milkyway.cs.rpi.edu/milkyway/ update


for windows:
:top
"C:\Program Files\BOINC\boinccmd" --passwd PASSWORD --project http://milkyway.cs.rpi.edu/milkyway/ update

TIMEOUT /T 92

goto top
ID: 69283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69286 - Posted: 22 Nov 2019, 14:18:17 UTC - in response to Message 69283.  
Last modified: 22 Nov 2019, 14:45:16 UTC

I simply just alter the coproc file so I can get the max 900 then run an update command for every 92 seconds.

For linux:
watch -n 92 boinccmd --project http://milkyway.cs.rpi.edu/milkyway/ update


for windows:
:top
"C:\Program Files\BOINC\boinccmd" --passwd PASSWORD --project http://milkyway.cs.rpi.edu/milkyway/ update

TIMEOUT /T 92

goto top


Yes, that works and you might also use
"C:\Program Files\BOINC\boinccmd --host hostname:port -passwd….." 
for remote systems

It is also possible to get more than 900 but that is only useful if the project goes offline for maintenance as you can continue to crunch until it comes back online. It is easier to do that with a change to boinc but it could still be done using those same update commands, a change to the coproc file and multiple clients. It would be better if the project could handle this problem.

The SETI GPU users club has a secret Boinc client they share among themselves to bypass project and Boinc download restrictions. I did not want to join their club so it became a challenge for me to come up with the same type of mod to the client. I am making all my changes public on GitHub for anyone to see.
ID: 69286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 28 Mar 18
Posts: 14
Credit: 744,490,433
RAC: 2,258,922
500 million credit badge2 year member badge
Message 69287 - Posted: 22 Nov 2019, 19:36:41 UTC - in response to Message 69286.  

Great work @JStateson!
From what understand, the Seti guys only made the custom Boinc for Linux. If we're running windows then we'd have to recompile our own, like you did. I'm running Milky on W10, so my cap is 900. But again, like you said, we can set multi instances to grab works if we anticipate a long down time.
ID: 69287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69289 - Posted: 22 Nov 2019, 20:56:31 UTC - in response to Message 69287.  

Great work @JStateson!
From what understand, the Seti guys only made the custom Boinc for Linux. If we're running windows then we'd have to recompile our own, like you did. I'm running Milky on W10, so my cap is 900. But again, like you said, we can set multi instances to grab works if we anticipate a long down time.


Thanks VietOZ!

With 6 GPUs I am averaging just over 7 seconds per work unit so 900 units last only about 2 hours. I am currently running 2 clients on that same system with each client getting 900 units. This will last for a total of 4 hours. I could run 6 clients and spoof the number of GPUS to allow me to crunch through about 12 hours of down time. I accidently deleted 900 work units setting up the second client but know how to do it correctly and am working on script to automate the extra clients.
ID: 69289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gambatesa
Avatar

Send message
Joined: 23 Feb 18
Posts: 26
Credit: 3,541,202,278
RAC: 5,587,008
3 billion credit badge2 year member badge
Message 69298 - Posted: 24 Nov 2019, 21:04:24 UTC - in response to Message 69289.  

I installed the first clients . It's too early to evaluate the improvements but in the next days i'll keep monitored logs and RAC

thanks for your help.. really appreciated for me and for the community..
Want your Kids stay off from Drugs? Get them building Crunching PC's and they'll never have enough money for drugs
ID: 69298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jpmboy

Send message
Joined: 29 Apr 17
Posts: 33
Credit: 2,877,659,312
RAC: 7,696,608
2 billion credit badge3 year member badge
Message 69352 - Posted: 18 Dec 2019, 1:09:11 UTC - in response to Message 69272.  



The latest version available for download is 7.14.2

Where can i download these new binaries?

http://stateson.net/bthistory/boinc_x64_for_milkyway.zip
The following procedure assumes that your original boinc.exe is at "/Program Files/boinc"
I do not have an install procedure so it must be installed manually
Extract the boinc.exe file from the zip archive and save it at /Downloads or where convenient
It can only be executed from the program directory so trying "boinc.exe --version" will tell you files are missing
You must stop boinc from executing before replacing it.
To stop boinc, First bring up the boinc manager, then exit the boinc manager and specify to stop programs from executing
After stopping boinc you should rename the original program from boinc.exe to old_boinc.exe
Copy the new program into the /Program Files/boinc folder
Starting up the boinc manager should also start up boinc. Check to see if the version is 7.15.0 for the new program. After a few minutes of looking at the event message you should notice a download of a few files. Eventually the number of work units waiting to be processed will rise up and hover near the maximum. The only time it will drop to 0 is when the project goes off-line. On my system the count stays between 850 - 890 all the time.
I have shortcuts for starting and stopping boinc but the normal startup for boinc must be removed from the windows registry or a conflict arises. PM me if you want to do this. They are not needed to get this milkyway version to work.
Let me know if a problem and I can put together a better set of instructions.
[EDIT]
for 32 bit systems (I have no way of testing this and no longer have xp, vista, 7 or 8)
http://stateson.net/bthistory/boinc_x32_for_milkyway.zip

Thank you for posting this. I tried using your mod'd boinc.exe as you described... initially there was no change in the number of task downloaded at any one time (always 300-ish with 3 GPUs installed), ot in the accumulation of tasks over a day (gpus remain in active for roughly 30-40% of time). So I them added some of the "options" and settings from the included cc_config.xml file in cluded in the down load and viola... I actually got 425 tasks... once, then the eventlog shows we're back to 300-ish tasks after waiting several 91 sec cycles, and occasionally, no "fetch" for 10 min.
Most times the fetch command is as follows:
12/17/2019 8:00:35 PM | Milkyway@Home | Sending scheduler request: To fetch work.
12/17/2019 8:00:35 PM | Milkyway@Home | Reporting 34 completed tasks
12/17/2019 8:00:35 PM | Milkyway@Home | Requesting new tasks for CPU and NVIDIA GPU
12/17/2019 8:00:35 PM | Milkyway@Home | [sched_op] CPU work request: 16278725.09 seconds; 0.00 devices
12/17/2019 8:00:35 PM | Milkyway@Home | [sched_op] NVIDIA GPU work request: 1553755.01 seconds; 0.00 devices
12/17/2019 8:00:37 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
12/17/2019 8:00:37 PM | Milkyway@Home | [sched_op] Server version 713
12/17/2019 8:00:37 PM | Milkyway@Home | No tasks sent
12/17/2019 8:00:37 PM | Milkyway@Home | Project requested delay of 91 seconds


once it receives the allocation of tasks, the GPU count drops to 0.00 and no further tasks are downloaded (but, the CPU task list adds one or more tasks, accumulating 2-3 days of work according to BoincTasks):

12/17/2019 7:51:12 PM | Milkyway@Home | [sched_op] Starting scheduler request
12/17/2019 7:51:12 PM | Milkyway@Home | Sending scheduler request: To fetch work.
12/17/2019 7:51:12 PM | Milkyway@Home | Reporting 1 completed tasks
12/17/2019 7:51:12 PM | Milkyway@Home | Requesting new tasks for CPU and NVIDIA GPU
12/17/2019 7:51:12 PM | Milkyway@Home | [sched_op] CPU work request: 16283322.16 seconds; 0.00 devices
12/17/2019 7:51:12 PM | Milkyway@Home | [sched_op] NVIDIA GPU work request: 1555200.00 seconds; 3.00 devices
12/17/2019 7:51:14 PM | Milkyway@Home | Scheduler request completed: got 328 new tasks
12/17/2019 7:51:14 PM | Milkyway@Home | [sched_op] Server version 713
12/17/2019 7:51:14 PM | Milkyway@Home | Project requested delay of 91 seconds
12/17/2019 7:51:14 PM | Milkyway@Home | [sched_op] estimated total CPU task duration: 0 seconds
12/17/2019 7:51:14 PM | Milkyway@Home | [sched_op] estimated total NVIDIA GPU task duration: 19186 seconds
12/17/2019 7:51:14 PM | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_80_bundle4_4s_south4s_bgset_2_1574164502_15520017_1
12/17/2019 7:51:14 PM | Milkyway@Home | [sched_op] Deferring communication for 00:01:31
12/17/2019 7:51:14 PM | Milkyway@Home | [sched_op] Reason: requested by project
12/17/2019 7:52:45 PM | Milkyway@Home | [sched_op] Starting scheduler request
12/17/2019 7:52:45 PM | Milkyway@Home | Sending scheduler request: To fetch work.
12/17/2019 7:52:45 PM | Milkyway@Home | Reporting 19 completed tasks
12/17/2019 7:52:45 PM | Milkyway@Home | Requesting new tasks for CPU and NVIDIA GPU
12/17/2019 7:52:45 PM | Milkyway@Home | [sched_op] CPU work request: 16277298.29 seconds; 0.00 devices
12/17/2019 7:52:45 PM | Milkyway@Home | [sched_op] NVIDIA GPU work request: 1552420.62 seconds; 0.00 devices
12/17/2019 7:52:47 PM | Milkyway@Home | Scheduler request completed: got 2 new tasks
12/17/2019 7:52:47 PM | Milkyway@Home | [sched_op] Server version 713
12/17/2019 7:52:47 PM | Milkyway@Home | Project requested delay of 91 seconds


So... I still can't get 200 tasks per GPU, and ceratinly not picking up GPU tasks until several "fetch" requests after the last GPU tyask has uploaded.

I'm running win10, a 7980XE, 3 Titan Vs... running 6 tasks per GPU. So unbelievably, adding the third titan V, rather than getting more tasks and processing more tasks, is actually resulting in more idle time as the lot of 300 tasks process in 2/3 the time.
Crazy!
ID: 69352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69354 - Posted: 18 Dec 2019, 6:10:03 UTC - in response to Message 69352.  
Last modified: 18 Dec 2019, 6:39:19 UTC



I'm running win10, a 7980XE, 3 Titan Vs... running 6 tasks per GPU. So unbelievably, adding the third titan V, rather than getting more tasks and processing more tasks, is actually resulting in more idle time as the lot of 300 tasks process in 2/3 the time.
Crazy!


Sent you a private message about this.
On the 7.15.0 version use the following cc_config.xml

<cc_config>
<log_flags>
<task>0</task>
<work_fetch_debug>0</work_fetch_debug>
<sched_ops>1</sched_ops>
<file_xfer>1</file_xfer>
<file_xfer_debug>0</file_xfer_debug>
<mw_debug>0</mw_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<allow_remote_gui_rpc>1</allow_remote_gui_rpc>
<mw_low_water_pct>1</mw_low_water_pct>
<mw_high_water_pct>16</mw_high_water_pct>
<mw_wait_interval >512</mw_wait_interval>
</options>
</cc_config>

It has no effect on the 7.14.666 version
The above increases the delay from 256 to 512 to allow more time to download

all you need is to add the line
<mw_wait_interval >512</mw_wait_interval>

if it does not help then try the 7.14.666 one

If it (the 512) makes an improvement then let me know and I will add that option back into my 7.14.666 version
ID: 69354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jpmboy

Send message
Joined: 29 Apr 17
Posts: 33
Credit: 2,877,659,312
RAC: 7,696,608
2 billion credit badge3 year member badge
Message 69355 - Posted: 18 Dec 2019, 13:59:17 UTC - in response to Message 69354.  

THanks. "good copy". :) when this last loaded batch finishes I'll make the changes you suggested here and in the reg... and PM you back.
ID: 69355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,267,178,262
RAC: 109,791
1 billion credit badge10 year member badge
Message 69356 - Posted: 18 Dec 2019, 15:26:56 UTC - in response to Message 69355.  
Last modified: 18 Dec 2019, 15:29:11 UTC

THanks. "good copy". :) when this last loaded batch finishes I'll make the changes you suggested here and in the reg... and PM you back.



I don't see why you are having problem with my 7.15.0 version. Several other systems here run that version. You should be downloading 900 at a time with 3 GPUs and the project max is 900 as I recall.

If you do try that cc_config, I had a typo. There is not supposed to be a space before the ">"
<mw_wait_interval >512</mw_wait_interval>



the following is correct

<mw_wait_interval>512</mw_wait_interval>
ID: 69356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle.

©2020 Astroinformatics Group