summaryrefslogtreecommitdiffhomepage
path: root/control/controlhttp
diff options
context:
space:
mode:
authorAndrew Dunham <andrew@du.nham.ca>2024-09-26 13:41:37 -0400
committerAndrew Dunham <andrew@du.nham.ca>2024-09-26 13:41:37 -0400
commit3df4df870b8d4665f27693deffdde3dff10bdc42 (patch)
treef332a2420b65f24d03e05a831c2b245d9fca6bf3 /control/controlhttp
parent5550a17391f1a8765f7f56831c759ee862b6b434 (diff)
downloadtailscale-andrew/noise-conn-test.tar.xz
tailscale-andrew/noise-conn-test.zip
control/controlhttp: test that a control conn is healthy before usingandrew/noise-conn-test
We've seen a bunch of cases where a captive portal or network with a firewall will allow a connection to the control server, successfully perform the Noise upgrade, but then fail immediately after trying to send any data on that now-upgraded connection. In many of these cases, we only see this behaviour on the plaintext connection over port 80. This interacts poorly with our controlhttp.Dialer's logic, which first tries to dial over port 80 (to avoid the overhead of double-encrypting, Noise and TLS), and only falls back to port 443/TLS if the connnection cannot be established or upgraded. In such cases, we'd essentially fail to connect to the control server entirely since we'd never fall back to port 443 (since the Noise upgrade succeeded) but we'd get an EOF when trying to do anything with that connection. This could be solved with the TS_FORCE_NOISE_443 envknob, but that's not a great experience for our users. Updates #13597 Signed-off-by: Andrew Dunham <andrew@du.nham.ca> Change-Id: I223dec0ae11b8f2946e3fb78dc49fcffc62470f3
Diffstat (limited to 'control/controlhttp')
-rw-r--r--control/controlhttp/client.go15
-rw-r--r--control/controlhttp/constants.go7
2 files changed, 22 insertions, 0 deletions
diff --git a/control/controlhttp/client.go b/control/controlhttp/client.go
index e01cb1f9a..522318c0e 100644
--- a/control/controlhttp/client.go
+++ b/control/controlhttp/client.go
@@ -310,6 +310,21 @@ func (a *Dialer) dialHost(ctx context.Context, addr netip.Addr) (*ClientConn, er
if debugNoiseDial() {
a.logf("noise dial (%v, %v) = (%v, %v)", u, addr, cbConn, err)
}
+
+ // We've seen some networks where the connection upgrades
+ // successfully, but then fails when we make a request after
+ // the upgrade. Work around this by making a request over the
+ // now-upgraded connection before we tell the outer function
+ // that we've got a connection.
+ if err == nil && a.TestConn != nil {
+ err = a.TestConn(cbConn)
+ if err != nil {
+ // Close and don't leak the connection.
+ cbConn.Close()
+ cbConn = nil
+ }
+ }
+
select {
case ch <- tryURLRes{u, cbConn, err}:
case <-ctx.Done():
diff --git a/control/controlhttp/constants.go b/control/controlhttp/constants.go
index 6b5116262..ffa61edec 100644
--- a/control/controlhttp/constants.go
+++ b/control/controlhttp/constants.go
@@ -88,6 +88,13 @@ type Dialer struct {
// plan before falling back to DNS.
DialPlan *tailcfg.ControlDialPlan
+ // TestConn, if non-nil, is called with a dialed connection to verify
+ // that it's ready to serve real requests. If this function returns an
+ // error, the connection is closed and not used. If this function
+ // returns an error for all dialed connections, an error is returned
+ // from Dial.
+ TestConn func(*ClientConn) error
+
proxyFunc func(*http.Request) (*url.URL, error) // or nil
// For tests only