summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)AuthorFilesLines
2026-04-13tsnet,magicsock: mark tests as flaky on darwin onlyapenwarr/flakeAvery Pennarun2-0/+11
The tests TestLoopbackLocalAPI, TestLoopbackSOCKS5, and TestTwoDevicePing were previously marked as flaky globally but had their marks removed after fixes. However, they appear to still be flaky specifically on macOS. Re-add flaky marks conditionally for darwin only, allowing the tests to run normally on Linux and Windows where they pass reliably. Change-Id: I7b81a16a12437c9aa55f2c9e9fde08d39499cabe Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13tstest/archtest: skip qemu test on race builder when qemu unavailableAvery Pennarun1-7/+8
The previous commit removed the !race build constraint, intending for the test to skip gracefully when qemu isn't installed. However, the race builder in CI doesn't have qemu installed (to save time), so the test fails there. Instead of using a build constraint, check racebuild.On at runtime to allow graceful skipping specifically on the race builder while still requiring qemu on other CI builders. Change-Id: I6690e2cd313b297240ecfaaf3439c4a2c24b33dd Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13gofmtAvery Pennarun3-17/+17
Change-Id: I7ef4a4b082f1ec73816a735b27f845d55f4ecd0b Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13remove unused flakytest imports from test filesAvery Pennarun2-2/+0
Change-Id: Ie50f99b84939b8e5727916ff9f0c9705500da246 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13docs: add flaky test fix reports (to be reverted)Avery Pennarun3-0/+518
Add documentation of the flaky test investigation and fixes: - fixed-tests-report.md: detailed breakdown of all fixes by tier - more-fixes-plan.md: root cause analysis and verification steps These files document the work done but should be reverted before merging to main. Change-Id: Ib0a2a787aaba2ef8b47475667db4677639c09645 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13remove flakytest.Mark for tests that are now fixedAvery Pennarun3-8/+0
Remove flaky test markers from tests whose root causes have been identified and fixed: - TestNodeAddressIPFields (issue #7008) - TestClientSideJailing (issue #17419) - TestNATPing (issue #12169) - TestPeerRelayPing (issue #17251) - TestLoopbackLocalAPI (issue #8557) - TestLoopbackSOCKS5 (issue #8198) - TestTwoDevicePing (issue #11762) These tests now pass consistently under repeated runs with -count=N and with the race detector enabled. Change-Id: Iebd47a8e0838612bae23aeb146cba7ef94582c76 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13fix flaky tests: improve test isolation and reliabilityAvery Pennarun19-102/+295
Test fixes for consistent pass rates under repeated runs (-count=N): Global state isolation: - appc: check metric deltas instead of absolute values - cmd/derper: initialize DNS caches before test - tsweb/varz: use unpublished expvar to avoid global registration - wgengine/netstack: check metric deltas instead of absolute values - net/dns: use SetForTest() with deferred restore for hooks Timeout and concurrency: - cmd/containerboot: increase wait loop timeouts for parallel load - tsconsensus: add deadline to waitFor(), use sync.Once for netns - tstest/integration: add tstest.Shard/Parallel, fix IPN bus watchers - net/netcheck: set testCaptivePortalDelay to prevent hangs - wgengine/magicsock: use Port:0, add timeout to callback wait - drive/driveimpl: use http.Server for proper shutdown Race conditions: - tstest/archtest: remove !race from build constraint - util/deephash: use local sink variable instead of package-level - net/art: switch to math/rand/v2 for thread-safe globals - tstest/integration: use Status() instead of MustStatus() from goroutines Test optimization: - net/udprelay: rewrite VNI test to avoid iterating all 16M values - ipn/ipnlocal: reset env vars between subtests - cmd/containerboot/serve: use SetWaitDurationForTest - tsnet: wait for service VIP in AllowedIPs before dialing Change-Id: Id6186fb8a45031920550a208ded77382e57cc016 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13fix flaky tests: make code more testable and fix race conditionsAvery Pennarun5-8/+38
Changes to support flaky test fixes: - kube/services: add SetWaitDurationForTest() to avoid hardcoded 20s wait - net/netcheck: check context before captive portal detection - net/captivedetection: stop timer before waiting on channel - util/eventbus/eventbustest: use sync.Once to prevent double-close - cmd/tailscale/cli: fix noDupFlagify to handle repeated test runs Change-Id: I9c4af6996c7ff809df5fa8211c4de39a0a9183c9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13tstest/integration: use hard links for binary copies to avoid /tmp exhaustionAvery Pennarun1-7/+8
When running many parallel integration tests, each subtest copies ~70MB of binaries (tailscale + tailscaled) to its temp directory. With 24+ parallel subtests, this quickly exhausts /tmp space, causing "no space left on device" errors that appear as test flakes. Try hard linking first before falling back to copying. Hard links share the same inode and don't consume additional disk space. Change-Id: I8cddd993af40f99a34f9e994b2f5a6f5daf294bf Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13cmd/deflake: add tool for detecting flaky testsAvery Pennarun2-0/+717
Deflake is a tool that runs tests repeatedly to identify flaky tests. It integrates with the existing test infrastructure to: - Run tests with configurable iteration count (-count flag) - Detect tests that pass inconsistently - Handle race detector testing (-race flag) - Skip Example tests (Go runs them exactly once regardless of -count) - Use isolated TMPDIR per run to prevent /tmp exhaustion - Clean up old Test* directories from crashed tests - Set proper timeout hierarchy (go test -timeout > context timeout) Usage: go build ./cmd/deflake ./deflake -packages=./... -count=10 ./deflake -packages=./tstest/integration -count=20 -race Change-Id: I4a36314e92197feb8f860a6e2c0b5b0202ce2915 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13tstest: fix kernel version parsing for Debian-style version stringsAvery Pennarun1-3/+6
The kernel version parser used strings.Cut with "-" to handle versions like "5.4.0-76-generic", but Debian uses "+" in versions like "6.12.41+deb13-amd64". Use strings.IndexAny to find the first "-" or "+" and truncate there. Fixes TestKernelVersion on Debian systems. Change-Id: I70e5f95682d54baf908e51f9f4b51c130b00aaaa Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13tstest/integration: clear SSH_CLIENT env to prevent false positive detectionAvery Pennarun1-0/+1
When running integration tests over SSH (e.g., in remote development environments), the SSH_CLIENT environment variable is set. This causes isSSHOverTailscale() to incorrectly detect an SSH session and change behavior. Clear SSH_CLIENT in the test node environment to prevent these false positives. Change-Id: I1411abf0be9704cce37051476efb04d59beed386 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-11tstest/tailmac: add headless mode for automated VM testingBrad Fitzpatrick5-10/+36
Add a --headless flag to the Host.app Run subcommand for running macOS VMs without a GUI, enabling use from test frameworks. Key changes: - HostCli.swift: When --headless is set, run the VM via VMController + RunLoop.main.run() instead of NSApplicationMain. Using the RunLoop (not dispatchMain) is required because VZ framework callbacks depend on RunLoop sources. - VMController.swift: Add headless parameter to createVirtualMachine that configures a single socket-based NIC (no NAT NIC). This matches the NIC configuration used when creating/saving VMs, so saved state restoration works correctly. A NIC count mismatch causes VZ to silently fail to execute guest code. - TailMacConfigHelper.swift: Clean up socket network device logging. - Config.swift: Move VM storage from ~/VM.bundle to ~/.cache/tailscale/vmtest/macos/. - TailMac.swift: Fix dispatchMain→RunLoop.main.run() in the create command (same VZ RunLoop requirement). Updates #13038 Change-Id: Iea51c043aa92e8fc6257139b9f0e2e7677072fa2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-10gokrazy: add arm64 natlab appliance image supportBrad Fitzpatrick7-3/+27
Add natlabapp.arm64 config and gokrazydeps.go for building a gokrazy natlab appliance image targeting arm64 (Apple Silicon). This is the arm64 counterpart to the existing natlabapp (amd64) used by vmtest. The arm64 image uses github.com/gokrazy/kernel.arm64 and is built with "make natlab-arm64" in the gokrazy directory. Updates #13038 Change-Id: I0e1f8e5840083a5de5954f2cf46e3babec129d96 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-10.github, tool/listpkgs: automatically find tests which use tstest.RequireRootBrad Fitzpatrick5-11/+82
Updates tailscale/corp#40007 Change-Id: I677d3d9e276cb6633a14ac07e4b58ea08e52fac4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-10cmd/derper,derp: add --rate-config file with SIGHUP reload (#19314)Mike O'Driscoll3-52/+412
Add a --rate-config flag pointing to a JSON file for per-client receive rate limits (bytes/sec and burst bytes). The config is reloaded on SIGHUP, updating all existing client connections live. The --per-client-rate-limit and --per-client-rate-burst flags are removed in favor of the config file. In derpserver, rate limiting uses an atomic.Pointer[xrate.Limiter] per client: nil when unlimited or mesh (zero overhead), non-nil when rate-limited. Document that clientSet.activeClient Store operations require Server.mu. Updates tailscale/corp#38509 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-04-10wgengine/router/osrouter: fix privileged tests missing fake netfilter runnerAmal Bansode1-0/+4
These test failures were never caught by CI because the package in question was missing from our privileged tests list. tailscale/corp#40007 covers improving our process around this. Fixes #19316 Signed-off-by: Amal Bansode <amal@tailscale.com>
2026-04-10tstest: add RequireRoot helperBrad Fitzpatrick4-18/+15
Start using a common helper for tests to declare that they require root. This is step 1. A later step will then make this helper track which tests were skipped so a subsequent pass will run these test as root. Updates tailscale/corp#40007 Change-Id: I4979e1def0fa3691d38c83f48c89aaa443e7f62e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-10tka: Revert "improve logging for Compact and Commit operations"Alex Chan2-13/+0
This reverts commit b25920dfc07452833895ad00b42db7e581b3cec8. The `log.Printf` messages are causing panics in corp, in particular: > panic: please use tailscale.com/logger.Logf instead of the log package Fixing the TKA code to plumb through a logger properly is going to be a hassle, so for now remove these logs to unblock merges to corp. Updates tailscale/corp#39455 Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-10tka: keep the CompactionDefaults alongside the other limitsAlex Chan3-7/+19
Updates #cleanup Change-Id: Ib5e481d5a9c7ec7ac3e6b3913909ab1bf21d7a4d Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-09ipn/ipnlocal: add netmap mutations to the ipn bus (#19120)Jonathan Nobels4-13/+242
ipn/local: add netmap mutations to the ipn bus updates tailscale/tailscale#1909 This adds a new new NotifyWatchOpt that allows watchers to receive PeerChange events (derived from node mutations) on the IPN bus in lieu of a complete netmap. We'll continue to send the full netmap for any map response that includes it, but for mutations, sending PeerChange events gives the client the option to manage it's own models more selectively and cuts way down on json serialization overhead. On chatty tailnets, this will vastly reduce the amount of chatter on the bus. This change should be backwards compatible, it is purely additive. Clients that subscribe to NotifyNetmap will get the full netmap for every delta. New clients can omit that and instead opt into NotifyPeerChanges. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2026-04-09cmd/k8s-operator: set PreferDualStack on ProxyGroup egress services (#19194)Fernando Serboncini2-3/+5
On dual-stack clusters defaulting to IPv6, the ProxyGroup egress service only got an IPv6 address, which causes request failures. Individual egress proxies already set PreferDualStack correctly. Fixes: #18768 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-04-09ssh/tailssh: fix default PATH for DebianAndrew Dunham1-1/+1
Validated against a modern Debian install, fixes a typo. Updates #cleanup Signed-off-by: Andrew Dunham <andrew@du.nham.ca> Change-Id: I7b26012f54dbd2f0f9fea98722e8edc2fe97645a
2026-04-09tstest/natlab: add TestSubnetRouterFreeBSD with FreeBSD cloud image supportBrad Fitzpatrick6-39/+163
As a warm-up to making natlab support multiple operating systems, start with an easy one (in that it's also Unixy and open source like Linux) and add FreeBSD 15.0 as a VM OS option for the vmtest integration test framework, and add TestSubnetRouterFreeBSD which tests subnet routing through a FreeBSD VM (Gokrazy → FreeBSD → Gokrazy). Key changes: - Add FreeBSD150 OSImage using the official FreeBSD 15.0 BASIC-CLOUDINIT cloud image (xz-compressed qcow2) - Add GOOS()/IsFreeBSD() methods to OSImage for cross-compilation and OS-specific behavior - Handle xz-compressed image downloads in ensureImage - Refactor compileBinaries into compileBinariesForOS to support multiple GOOS targets (linux, freebsd), with binaries registered at <goos>/<name> paths on the file server VIP - Add FreeBSD-specific cloud-init (nuageinit) user-data generation: string-form runcmd (nuageinit doesn't support YAML arrays), fetch(1) instead of curl, FreeBSD sysctl names for IP forwarding, mkdir /usr/local/bin, PATH setup for tta - Skip network-config in cidata ISO for FreeBSD (DHCP via rc.conf) Updates tailscale/tailscale#13038 Change-Id: Ibeb4f7d02659d5cd8e3a7c3a66ee7b1a92a0110d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-09cmd/k8s-operator: migrate to tailscale-client-go-v2 (#19010)David Bond33-933/+909
This commit modifies the kubernetes operator to use the `tailscale-client-go-v2` package instead of the internal tailscale client it was previously using. This now gives us the ability to expand out custom resources and features as they become available via the API module. The tailnet reconciler has also been modified to manage clients as tailnets are created and removed, providing each subsequent reconciler with a single `ClientProvider` that obtains a tailscale client for the respective tailnet by name, or the operator's default when presented with a blank string. Fixes: https://github.com/tailscale/corp/issues/38418 Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-09tka: improve logging for Compact and Commit operationsAlex Chan2-0/+13
Log whenever we: * Commit an AUM which was previously soft-deleted (which we don't expect to happen in practice, and may indicate an issue with our sync code) * Purge AUMs during a Compact operation. * Successfully commit AUMs as part of a bootstrap or sync operation. All three logs mention `tka` for easy of discoverability. Updates tailscale/corp#39455 Change-Id: I2b07bb0ef075877f40ec34b80bb668be59e1cdc3 Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-08vmtest: add VM-based integration test frameworkBrad Fitzpatrick12-11/+1382
Add tstest/natlab/vmtest, a high-level framework for running multi-VM integration tests with mixed OS types (gokrazy + Ubuntu/Debian cloud images) connected via natlab's vnet virtual network. The vmtest package provides: - Env type that orchestrates vnet, QEMU processes, and agent connections - OS image support (Gokrazy, Ubuntu2404, Debian12) with download/cache - QEMU launch per OS type (microvm for gokrazy, q35+KVM for cloud) - Cloud-init seed ISO generation with network-config for multi-NIC - Cross-compilation of test binaries for cloud VMs - Debug SSH NIC on cloud VMs for interactive debugging - Test helpers: ApproveRoutes, HTTPGet, TailscalePing, DumpStatus, WaitForPeerRoute, SSHExec TTA enhancements (cmd/tta): - Parameterize /up (accept-routes, advertise-routes, snat-subnet-routes) - Add /set, /start-webserver, /http-get endpoints - /http-get uses local.Client.UserDial for Tailscale-routed requests - Fix /ping for non-gokrazy systems TestSubnetRouter exercises a 3-VM subnet router scenario: client (gokrazy) → subnet-router (Ubuntu, dual-NIC) → backend (gokrazy) Verifies HTTP access to the backend webserver through the Tailscale subnet route. Passes in ~30 seconds. Updates tailscale/tailscale#13038 Change-Id: I165b64af241d37f5f5870e796a52502fc56146fa Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08tsweb: add TS_DEBUG_TRUSTED_CIDRS envknob to debug (#19283)Jason O'Donnell2-0/+129
Add a new envknob that allows connections from trusted CIDR ranges to access debug endpoints without Tailscale authentication. This is useful for in-cluster scrapers like Prometheus that are not on a tailnet, do not have static IP addresses and cannot use debug keys. Fixes #19282 Signed-off-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
2026-04-08misc: add install-git-hooks.go and git hook for Change-Id trackingBrad Fitzpatrick5-3/+408
Add misc/install-git-hooks.go and misc/git_hook/ to the OSS repo, adapted from the corp repo. The primary motivation is Change-Id generation in commit messages, which provides a persistent identifier for a change across cherry-picks between branches. The installer uses "git rev-parse --git-common-dir" instead of go-git to find the hooks directory, avoiding a new direct dependency while still supporting worktrees. Hooks included: - commit-msg: adds Change-Id trailer - pre-commit: blocks NOCOMMIT / DO NOT SUBMIT markers - pre-push: blocks local-directory replace directives in go.mod - post-checkout: warns when the hook binary is outdated Also update docs/commit-messages.md to reflect that Change-Id is no longer optional in the OSS repo. Updates tailscale/corp#39860 Change-Id: I09066b889118840c0ec6995cc03a9cf464740ffa Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08tool/goexe: refactor to use windows_sysNathan Perry5-249/+79
Updates #19255 Signed-off-by: Nathan Perry <nathan@tailscale.com> Change-Id: Idf69f23b5a61417d5fa3638a276d64856a6a6964
2026-04-08tool: replace go.cmd with a 19KB Rust go.exe wrapperBrad Fitzpatrick11-107/+757
go.cmd used cmd.exe to invoke PowerShell, which mangled arguments: cmd.exe treats ^ as an escape character (so -run "^$" became -run "$", running all tests instead of none) and = signs also caused issues in the PowerShell→cmd.exe argument passing layer. Replace it with a tiny no_std Rust binary (19KB, 32-bit x86 for universal Windows compat: x86/x64/ARM64) that directly invokes the Tailscale Go toolchain via CreateProcessW. The raw command line from GetCommandLineW is passed through to CreateProcessW with only argv[0] replaced, so arguments are never parsed or re-escaped. The binary also handles first-run toolchain download natively using curl.exe and tar.exe (both ship with Windows 10+), so PowerShell is no longer required for normal operation. The PowerShell fallback is only used for the rare TS_USE_GOCROSS=1 path. PowerShell prefers go.exe over go.cmd when resolving ./tool/go, so this is a drop-in replacement. With go.exe in place, the CI can use the natural -bench=. -benchtime=1x -run="^$" flags directly. Also removes tool/go-win.ps1 which is now unused. Updates #19255 Change-Id: I80da23285b74796e7694b89cff29a9fa0eaa6281 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08tstest/natlab/vnet: add multi-NIC node support, DHCP fixes, and VIPsBrad Fitzpatrick4-27/+314
Multi-NIC support: - Add nodeNIC type and node.extraNICs for secondary network interfaces - Add netForMAC/macForNet to route packets to the correct network by MAC - Update initFromConfig to allocate a MAC + LAN IP per network - Fix handleEthernetFrameFromVM, ServeUnixConn to use netForMAC - Fix MACOfIP, writeEth, WriteUDPPacketNoNAT, gVisor write path, and createARPResponse to use macForNet (return the MAC actually on that network, not the node's primary MAC) - Fix createDHCPResponse for multi-NIC (correct client IP and subnet) - Add nodeNICMac for secondary NIC MAC generation - Add Node accessors: NumNICs, NICMac, Networks, LanIP DHCP fixes: - Include LeaseTime, SubnetMask, Router, DNS in DHCP Offer (not just Ack). systemd-networkd requires these to accept an Offer. - Fix DHCP response source IP: use gateway IP instead of echoing the request's destination (which was 255.255.255.255 for discovers) New VIPs: - cloud-init.tailscale: serves per-node cloud-init meta-data, user-data, and network-config for VMs booting with nocloud datasource - files.tailscale: serves binary files (tta, tailscale, tailscaled) registered via RegisterFile for cloud VM provisioning - Add ControlServer() accessor for test control server This is necessary for a three-VM natlab subnet router integration test, coming later. Updates #13038 Change-Id: I59f9f356bae9b5509c117265237983972dfdd5af Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08tstest/integration/testcontrol: notify peers when subnet routes changeBrad Fitzpatrick1-0/+7
When SetSubnetRoutes is called, also send updatePeerChanged to all other connected nodes so they re-fetch their MapResponse and learn about the updated AllowedIPs. Without this, peers never see new subnet routes until they happen to reconnect to the control server. Discovered while working on a three-VM natlab subnet router integration test, coming later. Updates #13038 Change-Id: I20e7a2fda994a8ab0e7a24240e6eae536f4f5f15 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08control/controlclient: avoid calls to ms.netmap() (#19281)Claus Lensbøl2-18/+13
Instead of generating the full netmap, just fetch the peers out the the existing peers map. The extra usage was introduced with netmap caching, but there is no need to call the netmap to get this information, rather the existing peermap can be used. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-08wgengine/netstack: allow UDP listeners to receive traffic on Service VIP ↵Tom Meadows2-0/+216
addresses (#18972) Fixes UDP listeners on VIP Service addresses not receiving inbound traffic. - Modified shouldProcessInbound to check for registered UDP transport endpoints when processing packets to service VIPs - Uses FindTransportEndpoint to determine if a UDP listener exists for the destination VIP/port - Supports both IPv4 and IPv6 The aim was to mirror the existing TCP logic, providing feature parity for UDP-based services on VIP Services. Fixes #18971 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-07tsd, all: add Sys.ExtraRootCAs, plumb through TLS dial pathsBrad Fitzpatrick13-4/+108
Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through the control client, noise transport, DERP, and wgengine layers so that platforms like Android can inject user-installed CA certificates into Go's TLS verification. tlsdial.Config now honors base.RootCAs as additional trusted roots, tried after system roots and before the baked-in LetsEncrypt fallback. SetConfigExpectedCert gets the same treatment for domain-fronted DERP. The Android client will set sys.ExtraRootCAs with a pool built from x509.SystemCertPool + user-installed certs obtained via the Android KeyStore API, replacing the current SSL_CERT_DIR environment variable approach. Updates #8085 Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07go.toolchain.rev: update to Go 1.26.2Brad Fitzpatrick5-5/+5
Updates tailscale/corp#39799 Change-Id: I87c8dbabbbb7df750eb751fd7bfc506f57ca5796 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07derp: align FrameType docs casingJordan Whited3-20/+20
Updates #cleanup Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-04-07cmd/containerboot: rate-limit IPN bus netmap notificationsDoug Bryant1-3/+3
CPU profiling a containerboot subnet router on a large tailnet showed roughly 40% of CPU spent in serveWatchIPNBus JSON-encoding the full netmap on every update. containerboot only reads SelfNode fields from those notifications (and does a peer lookup when TailnetTargetFQDN is set), so it does not need every intermediate netmap delta. Set ipn.NotifyRateLimit on all three WatchIPNBus calls so netmap notifications are coalesced to one per 3s. Initial-state delivery is unaffected since the rateLimitingBusSender flushes the first send immediately. Updates #cleanup Signed-off-by: Doug Bryant <dougbryant@anthropic.com>
2026-04-07derp/derpserver: add per-connection receive rate limiting (#19222)Mike O'Driscoll3-6/+190
Add server-side per-client bandwidth enforcement using TCP backpressure. When configured, the server calls WaitN after reading each DERP frame, which delays the next read, fills the TCP receive buffer, shrinks the TCP window, and naturally throttles the sender — no packets are dropped. - Rate limiting is on the receive (inbound) side, which is what an abusive client controls - Mesh peers are exempt since they are trusted infrastructure - The burst size is at least MaxPacketSize (64KB) to ensure a single max-size frame can always be processed Also refactors sclient to store a context.Context directly instead of a done channel, which simplifies the rate limiter's WaitN call. Flags added to cmd/derper: --per-client-rate-limit (bytes/sec, default 0 = unlimited) --per-client-rate-burst (bytes, default 0 = 2x rate limit) Example for 10Mbps: --per-client-rate-limit=1250000 Updates #38509 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-04-07licenses: update license noticesLicense Updater2-16/+16
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2026-04-07k8s-operator/sessionrecording/ws: unify Read/Write frame parsing (#19227)altFernando Serboncini3-166/+215
Consolidate the duplicated WebSocket frame-parsing logic from Read and Write into a shared processFrames loop, fixing several bugs in the process: - Mixed control and data frames in a single Read/Write call buffer were not handled: a control frame would cause merged data frames to be skipped. - Multiple data frames into one Write call weren't being correctly parsed: only the first frame was processed, ignoring the rest in the buffer. - msg.isFinalized was being set before confirming the fragment was complete, so an incomplete msg fragment, could've been sometimes marked as finalized. - Continuation frames without any payload were being treated as if they didn't have stream ID, even thought the id is already known from the initial fragment. Fixes tailscale/corp#39583 Signed-off-by: Fernando Serboncini <fserb@tailscale.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-07ipn/desktop: move behind feature/condregisterBrad Fitzpatrick4-12/+18
Move the ipn/desktop blank import from cmd/tailscaled/tailscaled_windows.go into feature/condregister/maybe_desktop_sessions.go, consistent with how all other modular features are registered. tailscaled already imports feature/condregister, so it still gets ipn/desktop on Windows. Updates #12614 Change-Id: I92418c4bf0e67f0ab40542e47584762ac0ffa2b2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07feature/conn25: add IPv6 supportFran Bull3-115/+379
Make the DNS handling portions of conn25 work with IPv6 addresses. Fixes tailscale/corp#37850 Signed-off-by: Fran Bull <fran@tailscale.com>
2026-04-07ipn/desktop: use runtime.Pinner to force heap-allocation of msgNick Khyl1-4/+7
GetMessage can call back into Go, triggering stack growth and causing the stack to be copied to a new memory region, which invalidates the original stack pointer passed to the syscall. Since GetMessage uses that pointer to write the message before returning, this leads to memory corruption. In this PR, we fix this by using runtime.Pinner, which requires the pointer to refer to heap-allocated memory. Fixes #19263 Fixes #17832 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-04-07ipn/localapi, cli, clientmetric: add ipnbus feature tag; fix omit.go stubBrad Fitzpatrick6-4/+37
Add a new "ipnbus" build feature tag so the watch-ipn-bus LocalAPI endpoint can be independently controlled, rather than being gated behind HasDebug || HasServe. Minimal/embedded builds that omit both debug and serve were getting 404s on watch-ipn-bus, breaking "tailscale up --authkey=..." and other CLI flows that depend on WatchIPNBus. In the CLI, check buildfeatures.HasIPNBus before attempting to watch the IPN bus in "tailscale up"/"tailscale login", and exit early with an informational message when the feature is omitted. Also add the missing NewCounterFunc stub to clientmetric/omit.go, which caused compilation errors when building with ts_omit_clientmetrics and netstack enabled. Fixes #19240 Change-Id: I2e3c69a72fc50fa02542b91b8a54859618a463d1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07wgengine/userspace: add extra check for tsmp learned keys in engine (#19223)Claus Lensbøl2-4/+115
If an entry in the tsmpLearnedDisco does not match the disco key of the key currently being processed, overwrite the key, and leave the entry in the map for later processing. In reality, this should not happen, but is put in as a safety measure with logging of the situation so we can replicate the behaviour and correct it should it happen. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-07control/controlclient: add rwlock to peers in mapsession (#19261)Claus Lensbøl3-10/+97
After moving around locks in 4334dfa7d5ccbee1daf5acf30b33557bbca66525, a data race were made possible. Introduce an RWlock to the mapSession itself for fetching peers. Fixes #19260 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-07ssh/tailssh: fix race in session termination message deliveryBrad Fitzpatrick2-9/+14
When a recording upload fails mid-session, the recording goroutine cancels the session context. This triggers two concurrent paths: exec.CommandContext kills the process (causing cmd.Wait to return), and killProcessOnContextDone tries to write the termination message via exitOnce.Do. If cmd.Wait returns first, the main goroutine's exitOnce.Do(func(){}) steals the once, and the termination message is never written to the client. Fix by waiting for killProcessOnContextDone to finish writing the termination message (via <-ss.exitHandled) before claiming exitOnce, when the context is already done. Also fix the fallback path when launchProcess itself fails due to context cancellation: use SSHTerminationMessage() with the correct "\r\n\r\n" framing instead of fmt.Fprintf with the internal error string. Deflakes TestSSHRecordingCancelsSessionsOnUploadFailure, which was failing consistently at a low rate due to the exitOnce race. After this fix, flakestress passes with 8,668 runs, 0 failures. Fixes #7707 (again. hopefully for good.) Change-Id: I5ab911c71574db8d3f9d979fb839f273be51ecf9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07.golangci.yml: enforce gliderssh import alias via importas linterKristoffer Dalby1-0/+6
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>