diff options
| author | Andrew Dunham <andrew@du.nham.ca> | 2024-09-26 13:41:37 -0400 |
|---|---|---|
| committer | Andrew Dunham <andrew@du.nham.ca> | 2024-09-26 13:41:37 -0400 |
| commit | 3df4df870b8d4665f27693deffdde3dff10bdc42 (patch) | |
| tree | f332a2420b65f24d03e05a831c2b245d9fca6bf3 /control/controlhttp | |
| parent | 5550a17391f1a8765f7f56831c759ee862b6b434 (diff) | |
| download | tailscale-andrew/noise-conn-test.tar.xz tailscale-andrew/noise-conn-test.zip | |
control/controlhttp: test that a control conn is healthy before usingandrew/noise-conn-test
We've seen a bunch of cases where a captive portal or network with a
firewall will allow a connection to the control server, successfully
perform the Noise upgrade, but then fail immediately after trying to
send any data on that now-upgraded connection. In many of these cases,
we only see this behaviour on the plaintext connection over port 80.
This interacts poorly with our controlhttp.Dialer's logic, which first
tries to dial over port 80 (to avoid the overhead of double-encrypting,
Noise and TLS), and only falls back to port 443/TLS if the connnection
cannot be established or upgraded.
In such cases, we'd essentially fail to connect to the control server
entirely since we'd never fall back to port 443 (since the Noise upgrade
succeeded) but we'd get an EOF when trying to do anything with that
connection. This could be solved with the TS_FORCE_NOISE_443 envknob,
but that's not a great experience for our users.
Updates #13597
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I223dec0ae11b8f2946e3fb78dc49fcffc62470f3
Diffstat (limited to 'control/controlhttp')
| -rw-r--r-- | control/controlhttp/client.go | 15 | ||||
| -rw-r--r-- | control/controlhttp/constants.go | 7 |
2 files changed, 22 insertions, 0 deletions
diff --git a/control/controlhttp/client.go b/control/controlhttp/client.go index e01cb1f9a..522318c0e 100644 --- a/control/controlhttp/client.go +++ b/control/controlhttp/client.go @@ -310,6 +310,21 @@ func (a *Dialer) dialHost(ctx context.Context, addr netip.Addr) (*ClientConn, er if debugNoiseDial() { a.logf("noise dial (%v, %v) = (%v, %v)", u, addr, cbConn, err) } + + // We've seen some networks where the connection upgrades + // successfully, but then fails when we make a request after + // the upgrade. Work around this by making a request over the + // now-upgraded connection before we tell the outer function + // that we've got a connection. + if err == nil && a.TestConn != nil { + err = a.TestConn(cbConn) + if err != nil { + // Close and don't leak the connection. + cbConn.Close() + cbConn = nil + } + } + select { case ch <- tryURLRes{u, cbConn, err}: case <-ctx.Done(): diff --git a/control/controlhttp/constants.go b/control/controlhttp/constants.go index 6b5116262..ffa61edec 100644 --- a/control/controlhttp/constants.go +++ b/control/controlhttp/constants.go @@ -88,6 +88,13 @@ type Dialer struct { // plan before falling back to DNS. DialPlan *tailcfg.ControlDialPlan + // TestConn, if non-nil, is called with a dialed connection to verify + // that it's ready to serve real requests. If this function returns an + // error, the connection is closed and not used. If this function + // returns an error for all dialed connections, an error is returned + // from Dial. + TestConn func(*ClientConn) error + proxyFunc func(*http.Request) (*url.URL, error) // or nil // For tests only |
