PSA on websocket timeouts

greg_keys · July 10, 2020, 6:46pm

Public service announcement regarding websocket timeouts.
TL;DR Websockets close after 1 minute of inactivity

So I have been plagued by a connection issue for the last month and decided to track down the problem.

I was seeing this in chrome dev tools

after 1 minute the blue line goes away and it reports Content Downloaded it took me a while to figure out that basically means the websocket connection closed.

My session managment scheme was to refresh the token every 4 minutes which also refreshes the websocket connection, the problem with this interval is that after 1 minute and before the 4 minute refresh the connection is closed; I would get access denied problems because the connection would close and resgate would drop the connection id authentication information.

there are at least three ways to solve this.

implement a ping to keep the connection open. You can do this by simply calling a ping or heartbeat method on resgate, (might be worth putting this in the protocol)
in my case I just adjusted my token refresh from 4 minutes down to 58 seconds which works for now. I might implement a heartbeat as we scale if I start to notice performance issues.
use the client.setOnConnect method to refresh the connection authentication information. this didnt work for me because I set aggressive token timeouts, they expire after 5 minutes if they arent used. so if someone waited 6 minutes to do anything on the site it would log them out onConnect, so I opted for the interval refresh to keep the token alive. which now expires after 1 minute which is probably better for security anyways.

anyways I thought I would put out a PSA just in case anyone else runs into this and doesn’t want to waste time trying to figure it out

Jirenius · July 11, 2020, 6:34am

Hi Greg!

Resgate/chrome will not timeout an idle connection. Try going to Live Demo • resgate.io and check Chrome Dev Tools, and you will see that the WebSocket connections to api.resgate.io are kept alive without additional pinging.

Your issue is most likely with your reverse proxy.
Are you using Nginx? Nginx is also used for api.resgate.io. What you need to do is set the proxy_read_timeout directive:

proxy_read_timeout 86400;

On Nginx.org - webSocket proxying it says:

By default, the connection will be closed if the proxied server does not transmit any data within 60 seconds. This timeout can be increased with the proxy_read_timeout directive. Alternatively, the proxied server can be configured to periodically send WebSocket ping frames to reset the timeout and check if the connection is still alive.

That said, adding a setting that enables WebSocket ping frames from Resgate might still be a nice feature.

Best regards,
Samuel

greg_keys · July 11, 2020, 6:36am

ah that could indeed be the cause, i do use the nginx ingress on kubernetes

greg_keys · July 15, 2020, 5:21pm

ok I spent some time adjusting my nginx ingress… still getting closed connections. here is my nginx conf for resgate.

location /ws {
		
		set $namespace      "default";
		set $ingress_name   "resgate";
		set $service_name   "resgate";
		set $service_port   "8080";
		set $location_path  "/ws";
		
		rewrite_by_lua_block {
			lua_ingress.rewrite({
				force_ssl_redirect = false,
				ssl_redirect = true,
				force_no_ssl_redirect = false,
				use_port_in_redirects = false,
			})
			balancer.rewrite()
			plugins.run()
		}
		
		# be careful with `access_by_lua_block` and `satisfy any` directives as satisfy any
		# will always succeed when there's `access_by_lua_block` that does not have any lua code doing `ngx.exit(ngx.DECLINED)`
		# other authentication method such as basic auth or external auth useless - all requests will be allowed.
		#access_by_lua_block {
		#}
		
		header_filter_by_lua_block {
			lua_ingress.header()
			plugins.run()
		}
		
		body_filter_by_lua_block {
		}
		
		log_by_lua_block {
			balancer.log()
			
			monitor.call()
			
			plugins.run()
		}
		
		port_in_redirect off;
		
		set $balancer_ewma_score -1;
		set $proxy_upstream_name "default-resgate-8080";
		set $proxy_host          $proxy_upstream_name;
		set $pass_access_scheme  $scheme;
		
		set $pass_server_port    $server_port;
		
		set $best_http_host      $http_host;
		set $pass_port           $pass_server_port;
		
		set $proxy_alternative_upstream_name "";
		
		client_max_body_size                    1m;
		
		proxy_set_header Host                   $best_http_host;
		
		# Pass the extracted client certificate to the backend
		
		# Allow websocket connections
		proxy_set_header                        Upgrade           $http_upgrade;
		
		proxy_set_header                        Connection        $connection_upgrade;
		
		proxy_set_header X-Request-ID           $req_id;
		proxy_set_header X-Real-IP              $remote_addr;
		
		proxy_set_header X-Forwarded-For        $remote_addr;
		
		proxy_set_header X-Forwarded-Host       $best_http_host;
		proxy_set_header X-Forwarded-Port       $pass_port;
		proxy_set_header X-Forwarded-Proto      $pass_access_scheme;
		
		proxy_set_header X-Scheme               $pass_access_scheme;
		
		# Pass the original X-Forwarded-For
		proxy_set_header X-Original-Forwarded-For $http_x_forwarded_for;
		
		# mitigate HTTPoxy Vulnerability
		# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
		proxy_set_header Proxy                  "";
		
		# Custom headers to proxied server
		
		proxy_connect_timeout                   3600s;
		proxy_send_timeout                      3600s;
		proxy_read_timeout                      3600s;
		
		proxy_buffering                         off;
		proxy_buffer_size                       4k;
		proxy_buffers                           4 4k;
		
		proxy_max_temp_file_size                1024m;
		
		proxy_request_buffering                 on;
		proxy_http_version                      1.1;
		
		proxy_cookie_domain                     off;
		proxy_cookie_path                       off;
		
		# In case of errors try the next upstream server before returning an error
		proxy_next_upstream                     error timeout;
		proxy_next_upstream_timeout             0;
		proxy_next_upstream_tries               3;
		
		proxy_pass http://upstream_balancer;
		
		proxy_redirect                          off;
		
	}

Jirenius · July 21, 2020, 12:40pm

Back from a computer/Internet free week of camping

The config looks fine to me, though I must admit I haven’t used Nginx as a Kubernetes ingress myself, but have rather been using Traefik.

I still believe it is Nginx causing the issue though.

Have you tried setting the timeout through Kubernetes annotations for nginx?

Maybe this SE question may be of help: kubernetes - increase proxy_send_timeout and proxy_read_timeout ingress nginx - Stack Overflow

/Samuel