Message boards :
Number crunching :
scheduler request timeout when reporting more than 1 result at once
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
I ran MilkyWay on two computers or so until March 20, then diverted to other projects. I don't recall any issues with MilkyWay from before March 20. This weekend I turned one of the computers back on to MilkyWay (a fast and large computer), and soon this computer got into a situation in which all of the scheduler requests to report results (and to request new work) timed out. I noticed that I had <max_tasks_reported>40</max_tasks_reported> in cc_config.xml (which isn't a lot) and decreased it now. Turns out that 2, 3 or 4 sometimes get through without timeout, but not always. None of my attempts with more than 4 tasks reported at once succeeded; they are all timing out. It seems as if the attempts with 2, 3, or 4 at once have a better chance of succeeding if the client does not request new work in the same transaction. But I am not sure about it. However, those attempts still time out very often, therefore I temporarily set "no new work" and <max_tasks_reported>1</max_tasks_reported>. And that seems never to time out. The one big downside here is that I've got ~1,800 results to report by now. I am forcing a scheduler request every 15 seconds now in hope to get this backlog cleared over night. :-( BTW, the current median task duration on this computer is about 6 minutes currently, and it runs 32 MilkyWay tasks concurrently. That is, in steady state, it would have a mean time between task completions of maybe 11 seconds or so. (Depends on the rolling average of the task sizes, obviously.) With MilkyWay's server-side request_backoff of 91 seconds, i.e. the client normally issuing requests at most every ~96 seconds or so by itself, this client would normally need to report 8 or more results at once within a singe scheduler request, if it were to run MilkyWay at full time with the current small tasks. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,429,972 RAC: 38,307 |
This an OLD problem that has existed at MW since its inception. It is impossible to report a task and receive a replacement task in the same scheduler connection. It can't be done without employing custom clients and workarounds. It takes two scheduler connections, first to report a completed task and then to receive the replacement task(s) on the next scheduler connection. And when you have depleted your cache and reported the last task, MW forces the client into a mandatory 10 minute backoff before allowing a new scheduler connection to replenish your cache. This was always the main complaint of high production clients when the Separation work was available. Nothing has changed in the scheduler code but the longer running N-body tasks mostly eliminate the problem because no tasks finish faster than the default 91 second scheduler connection interval. But your high production host has duplicated exactly the same issue that the Separation tasks caused. |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
My trouble were scheduler transaction failures by timeouts though, not regular responses which did not grant work and mandated a large backoff period. BTW, there is a non-negligible possibility that these timeouts were caused by my lowlife internet service provider, not by MilkyWay's project server. -------- Now this morning, as my stash of results ready to report was almost depleted, I switched the client back to <max_tasks_reported>40</max_tasks_reported>. It issued two final requests of this size and both of them succeeded = did not fail with timeout this time. I then let it request and receive new work and will leave the client doing its thing for now. I changed <report_results_immediately> from 0 to 1 though, maybe this will help to reduce the buildup of large quantities of results ready to report. Alas my internet link is not very reliable (which is why I prefer to run BOINC with a work buffer size which is deeper than half a day, during which I am not at home to fix things), hence it could happen any time again that the client needs to report the results of several hours worth of work. During the hour before I have to leave home, the client is now reporting >10 results per request on average (peaked at 31 results), and all of these scheduler transactions succeeded. Crossing fingers that it continues to work while I'm away. None of the mentioned successful requests of this morning combined the reporting of results and the requesting of new work though. The first of such combined requests of today is going to happen in my absence... |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
Everything has been working nicely today. According to logs, the client has been reporting results at the maximum request rate which the server admits. As to be expected, there were 3...11 results in each scheduler request. And about hourly, the buffer of runnable tasks dipped below 1000, the client requested more work, and got more work. No scheduler request timeouts today, as far as I can see. I will keep <report_results_immediately>1</report_results_immediately> on this host as long as I have it active at MilkyWay and the tasks are so small as they are currently. |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
The problem reappeared. When I woke up today, my first look was at the powermeter of the host, and I could tell right away that it was running NFS@home instead of MilkyWay. (I have NFS configured as backup project with 0% resource share.) Turned out that the client had stopped reporting results at some point between 4 and 5 AM, had now accumulated >600 good results and >500 error results ("process exited with code 13 (0xd, -243)", see thread 5097), and the client had chosen to back off of further scheduler requests for >1 d. Obviously, the computation errors made the client stop reporting the results. But that's not the issue here. Rather: When I started clearing this reporting backlog by manually forced project updates, scheduler requests with 40 results to report + requests for new work succeeded at first. But after the host reached a certain level of tasks in progress (that is, sum of downloading/downloaded/uploading/uploaded/error tasks) of ~1600 or so, scheduler requests with 40 as well as with 30 results to report at once failed with timeout. This failure happened regardless whether or not new work was requested in the same request. Reporting 20 results at once still worked. So I forced such requests every 10 seconds until all results were reported. I will now leave the client alone again, with <max_tasks_reported>20</max_tasks_reported> and <report_results_immediately>1</report_results_immediately>, plus I am running a monitoring script which checks every 91+5 seconds if the number of results ready to report exceeds 20, and if so, forces a project update. |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
I didn't run MilkyWay@Home for a week but restarted it half a day ago. And the problem reappeared today at 00:00 UTC. This happened: – At this point, the host had 1003 tasks in progress, reported 7 results, and requested more work. – The server assigned 1009 ! tasks to the host, and the host had now 2005 tasks in progress. – From then on, almost all scheduler requests to report results (with max_tasks_reported=20) failed with timeout. – I discovered this situation at 04:30 UTC, at which time the host still had 1752 tasks in progress, even though a background script had been forcing scheduler requests every 97 seconds as soon as more than 20 results ready to report had been piling up. To recover, I switched temporarily to max_tasks_reported=1, set the project to no new work, and engaged a script to report a result every ten seconds. Most of these requests succeed, but occasionally even these single-result reports time out. I will let the computer run another project while I am away from home today. After that I will perhaps set up multiple client instances on this computer in order to be able to maintain about half a day work buffer depth on it without having so many tasks in progress per client. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I didn't run MilkyWay@Home for a week but restarted it half a day ago. Your location is hidden but do you think it could be the internet itself that's not cooperating between you and the MilkyWay Servers in Wisconsin? There are alot of reports on the various News sites about internet cables are physically down meaning routing options are limited at times. My thought is can you do an IP scan of the MilkyWay Server when it's not cooperating and see if it's up and running normally or if it's slow too. |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
mikey wrote: Your location is hidden but do you think it could be the internet itself that's not cooperating between you and the MilkyWay Servers in Wisconsin?I am based in Germany, and I do have internet service of terrible quality via coaxial cable, which is why I prefer >0.5 days deep work buffers. However: During these situations with scheduler request timeouts between this host and MilkyWay, the MW web site keeps responding very promptly, including dynamic web pages with database access such as the result tables. Second: If I reduce max_tasks_reported as far down as necessary, MW's scheduler responds very quickly to the client again too (to most but not all requests). And third: When I manage to gradually get the number of tasks in progress down again during recovery from these situations, I can gradually increase max_tasks_reported again too. The correlation between likelihood of scheduler request timeout, max_tasks_reported, and number of tasks in progress is really evident in my observations. (The three occasions reported here were with ~1,800, ~1,600, and ~2,000 tasks in progress when — or a while after — the described condition started.) Obviously, only large hosts with large work buffer setting will ever receive so many tasks from the server. I, like most everyone else, am running a stock client which does not request more work once there are 1,000 runnable tasks buffered. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
mikey wrote:Your location is hidden but do you think it could be the internet itself that's not cooperating between you and the MilkyWay Servers in Wisconsin?I am based in Germany, and I do have internet service of terrible quality via coaxial cable, which is why I prefer >0.5 days deep work buffers. However: I wonder if the Server has a move on to the next client setting after it's been connected more than x seconds? That way most people can get and return their tasks within that x seconds but you, because of your large cache of tasks both needed and returning, takes longer and you get disconnected? Whatever the problem it sounds like it's on the Server side as my 17 desktop pc's can get and return tasks just fine. |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
Each scheduler request to report results or/and to request more work includes a bunch of other client-side information. Among it is a list of all tasks which the client currently has got in its work buffer. Therefore, if a client has got a large number of tasks in progress, the data payload within the scheduler request gets respectively larger (by about 190 bytes per task in progress; and more than that per result which is being reported). That is, my host, having unusually many tasks in progress, sent scheduler requests with unusually large data size. And from a certain size on, the likelihood of request timeouts jumped up. (I am not sure at which size that is precisely; seems to be in the ballpark of ~300 kBytes.) Now what caused these timeouts? The MilkyWay@Home server kept responding very quickly to other requests, as I mentioned. That is, at least server-side database performance does not appear to be an issue. It is very well possible that the way how my Internet Service Provider is throttling the speed on the upstream channels of my coaxial cable connection is at fault. It seems as if my ISP changed something to the worse in this regard during the course of 2023. — Or it could be related to transcontinental routing problems which you mentioned in #77085. But if so, only large upstream transfers are affected, not small ones, nor downstream transfers of any size. — In general, I have been able to upload large result files of other projects in early 2023, but suddenly had great difficulties with it since later in 2023, but only to project servers outside of Europe. I am able to work around this for result file uploads by setting an upload speed limit in the BOINC client. But this does evidently not help with scheduler requests with large payload. So far, MilkyWay@Home is the only project at which I noticed this problem with large scheduler requests. This may or may not be related to MilkyWay@Home, as I don't know if I ever had so many tasks in progress on a single host in any other project (probably not), ever since my mentioned trouble with upstream transfers to overseas servers started to happen. Edit: @Kevin Roux, does the MilkyWay@Home HTTP server have a size limit for HTTP POST request body size configured to something at the order of magnitude of the mentioned ~300 kBytes? (I don't know HTTP and its implementations well enough to say whether or not this can be a problem. I would think that the server responds with an error status if the client issues a too large POST request, e.g. 413 Payload Too Large, rather than not responding at all. But I may be wrong.) Note to self: Reproduce the problem with <http_debug> enabled in the client. |
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,848,179 RAC: 6,872 |
I will ask the IT team that set up the HTTP settings for the server and get back to you when I have more information. |
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,848,179 RAC: 6,872 |
The configurations for this setting have been left at their default. There is no limit set on the http.conf for MilkyWay@home and the post_max_size for PHP is set to a default of 8MB so nothing on the order of ~300kBytes as far as I can tell. I am not exactly sure what the problem is. Is this something that is still happening? |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,034,474 RAC: 35,537 |
Thanks a lot for looking up these settings. Sounds like the problem was/is with my Internet service provider then. I had been avoiding this problem lately by limiting the client's work buffer. Will do so also during the upcoming Pentathlon race. I haven't taken the time yet to provoke the issue again but with HTTP debug logging enabled in the client. |
Send message Joined: 18 Nov 22 Posts: 84 Credit: 640,530,847 RAC: 0 |
FYI, I'm having the same or similar problem on one of my hosts. will not report any tasks. just says HTTP internal server error. wont even report 1 task. I enabled http_debug logging in the BOINC client: Tue 14 May 2024 11:35:21 AM EDT | Milkyway@home | Reporting 1 completed tasks Tue 14 May 2024 11:35:21 AM EDT | Milkyway@home | Requesting new tasks for CPU Tue 14 May 2024 11:35:21 AM EDT | Milkyway@home | [sched_op] CPU work request: 1.00 seconds; 477.00 devices Tue 14 May 2024 11:35:21 AM EDT | Milkyway@home | [http] HTTP_OP::init_post(): https://milkyway.cs.rpi.edu/milkyway_cgi/cgi Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: Hostname milkyway.cs.rpi.edu was found in DNS cache Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: Trying 128.113.126.54:443... Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: Connected to milkyway.cs.rpi.edu (128.113.126.54) port 443 (#241) Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: ALPN, offering http/1.1 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: CAfile: /etc/ssl/certs/ca-certificates.crt Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: CApath: /etc/ssl/certs Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.0 (OUT), TLS header, Certificate Status (22): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (OUT), TLS handshake, Client hello (1): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Certificate Status (22): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, Server hello (2): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Finished (20): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, Certificate (11): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, CERT verify (15): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, Finished (20): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Finished (20): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (OUT), TLS handshake, Finished (20): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: ALPN, server accepted to use http/1.1 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: Server certificate: Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: subject: C=US; ST=New York; O=Rensselaer Polytechnic Institute; CN=milkyway.cs.rpi.edu Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: start date: Nov 1 00:00:00 2023 GMT Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: expire date: Oct 31 23:59:59 2024 GMT Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: subjectAltName: host "milkyway.cs.rpi.edu" matched cert's "milkyway.cs.rpi.edu" Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: issuer: C=US; O=Internet2; CN=InCommon RSA Server CA 2 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: SSL certificate verify ok. Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: POST /milkyway_cgi/cgi HTTP/1.0 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Host: milkyway.cs.rpi.edu Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (x86_64-pc-linux-gnu 7.19.0) Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Accept: */* Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip, br, zstd Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Accept-Language: en_US Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Content-Length: 786318 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <host_venue>home</host_venue> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <email_hash>aef369d7ae0a7a5d399e5eabbe035364</email_hash> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <cross_project_id>2c5734cf793bd8619ef49eef09973627</cross_project_id> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <external_cpid>66b4359476520659b8db798e6d933131</external_cpid> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <cpid_time>1619197870.000000</cpid_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <user_total_credit>57588797.926310</user_total_credit> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <user_expavg_credit>535681.538026</user_expavg_credit> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <user_create_time>1619197870.000000</user_create_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <rpc_seqno>324268</rpc_seqno> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <userid>472486</userid> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <teamid>1832</teamid> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <hostid>748899</hostid> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <host_total_credit>24883644.120596</host_total_credit> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <host_expavg_credit>256857.206834</host_expavg_credit> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <host_create_time>1692200344.000000</host_create_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <nrpc_failures>2</nrpc_failures> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <master_fetch_failures>0</master_fetch_failures> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <min_rpc_time>0.000000</min_rpc_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <next_rpc_time>0.000000</next_rpc_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <rec>44923.877074</rec> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <rec_time>1715700921.559540</rec_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <resource_share>100.000000</resource_share> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <disk_usage>44412928.000000</disk_usage> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <disk_share>22493467681.801941</disk_share> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <desired_disk_usage>0.000000</desired_disk_usage> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <duration_correction_factor>1.000000</duration_correction_factor> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <sched_rpc_pending>0</sched_rpc_pending> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <send_time_stats_log>0</send_time_stats_log> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <send_job_log>0</send_job_log> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <njobs_success>367769</njobs_success> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <njobs_error>215</njobs_error> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <elapsed_time>1637200129.160292</elapsed_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <last_rpc_time>1714613301.807059</last_rpc_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <anonymous_platform/> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <dont_use_dcf/> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <suspended_via_gui/> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <rsc_backoff_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <name>CPU</name> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <value>0.000000</value> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: </rsc_backoff_time> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: <rsc_backoff_interval> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Sent header to server: Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: old SSL session ID is stale, removing Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: We are completely uploaded and fine Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: Mark bundle as not supporting multiuse Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: HTTP/1.1 500 Internal Server Error Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Date: Tue, 14 May 2024 15:35:22 GMT Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Server: Apache/2.4.52 (Ubuntu) Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Strict-Transport-Security: max-age=31536000; includeSubDomains Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Content-Length: 618 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Connection: close Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Content-Type: text/html; charset=iso-8859-1 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <html><head> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <title>500 Internal Server Error</title> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: </head><body> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <h1>Internal Server Error</h1> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <p>The server encountered an internal error or Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: misconfiguration and was unable to complete Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: your request.</p> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <p>Please contact the server administrator at Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: webmaster@localhost to inform them of the time this error occurred, Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: and the actions you performed just before this error.</p> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <p>More information about this error may be available Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: in the server error log.</p> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <hr> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: <address>Apache/2.4.52 (Ubuntu) Server at milkyway.cs.rpi.edu Port 443</address> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Received header from server: </body></html> Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: Closing connection 241 Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (IN), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (IN), TLS alert, close notify (256): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS header, Supplemental data (23): Tue 14 May 2024 11:35:22 AM EDT | Milkyway@home | [http] [ID#1] Info: TLSv1.3 (OUT), TLS alert, close notify (256): Tue 14 May 2024 11:35:23 AM EDT | Milkyway@home | Scheduler request failed: HTTP internal server error Tue 14 May 2024 11:35:23 AM EDT | Milkyway@home | [sched_op] Deferring communication for 02:10:27 Tue 14 May 2024 11:35:23 AM EDT | Milkyway@home | [sched_op] Reason: Scheduler request failed |
Send message Joined: 18 Nov 22 Posts: 84 Credit: 640,530,847 RAC: 0 |
as a followup, even schedule requests which do not report or request any work, fail in the same way. Doesnt seem to be ISP related another system on the same network has not hit this problem yet the use of a VPN on the trouble system still results in the same connection problems. some kind of error is happening on the project server during the scheduler requests. can you please check the logs? |
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,848,179 RAC: 6,872 |
The scheduler logs outputted this: 2024-05-14 16:17:43.0710 [PID=2494385] [CRITICAL] scanned old_key_i files, can find client's key 2024-05-14 16:17:43.6852 [PID=2494387] [CRITICAL] scanned old_key_i files, can find client's key |
Send message Joined: 18 Nov 22 Posts: 84 Credit: 640,530,847 RAC: 0 |
do you know what "key" it's looking for? is this error in response to my specific request? or you just see a lot of these errors in general? any idea for a resolution? |
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,848,179 RAC: 6,872 |
do you know what "key" it's looking for? is this error in response to my specific request? or you just see a lot of these errors in general? I have never seen this problem so I am not sure what is going on. I tried looking into it but couldn't find a solution. I will keep trying to find out what is going on. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,429,972 RAC: 38,307 |
The message is from here: https://github.com/BOINC/boinc/blob/308bac62d39ac5c85468a365ba01df6ae8751327/sched/handle_request.cpp#L908 Part of the handle_request.cpp module in /sched directory. The function that it is part of starts here: https://github.com/BOINC/boinc/blob/308bac62d39ac5c85468a365ba01df6ae8751327/sched/handle_request.cpp#L867 and seems to do management of file signatures. It seems to get invoked when the scheduler can't find the signatures of files in the host and checks to see if the host possibly had been reset generating a new hostID. |
©2024 Astroinformatics Group