summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)AuthorFilesLines
2025-11-26net/batching: add envknobs to disable UDP GRO & GSOraggi/envknobs-gso-groJames Tucker1-4/+10
It is sometimes useful when diagnosing subtle and specific performance problems to rule out GRO/GSO independently and/or toggle them to influence packet pacing. Updates #17835 Updates tailscale/corp#31164 Signed-off-by: James Tucker <james@tailscale.com>
2025-11-25cmd/tailscale/cli,ipn,all: make peer relay server port a *uint16Jordan Whited9-51/+48
In preparation for exposing its configuration via ipn.ConfigVAlpha, change {Masked}Prefs.RelayServerPort from *int to *uint16. This takes a defensive stance against invalid inputs at JSON decode time. 'tailscale set --relay-server-port' is currently the only input to this pref, and has always sanitized input to fit within a uint16. Updates tailscale/corp#34591 Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-11-25ipn/serve: validate service paths in HasPathHandlerSachin Iyer2-0/+44
Fixes #17839 Signed-off-by: Sachin Iyer <siyer@detail.dev>
2025-11-25net/tstun: add TSMPDiscoAdvertisement to TSMPPing (#17995)Claus Lensbøl7-25/+280
Adds a new types of TSMP messages for advertising disco keys keys to/from a peer, and implements the advertising triggered by a TSMP ping. Needed as part of the effort to cache the netmap and still let clients connect without control being reachable. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com> Co-authored-by: James Tucker <james@tailscale.com>
2025-11-25ipn/ipnlocal: don't panic if there are no suitable exit nodesAlex Chan2-0/+66
In suggestExitNodeLocked, if no exit node candidates have a home DERP or valid location info, `bestCandidates` is an empty slice. This slice is passed to `selectNode` (`randomNode` in prod): ```go func randomNode(nodes views.Slice[tailcfg.NodeView], …) tailcfg.NodeView { … return nodes.At(rand.IntN(nodes.Len())) } ``` An empty slice becomes a call to `rand.IntN(0)`, which panics. This patch changes the behaviour, so if we've filtered out all the candidates before calling `selectNode`, reset the list and then pick from any of the available candidates. This patch also updates our tests to give us more coverage of `randomNode`, so we can spot other potential issues. Updates #17661 Change-Id: I63eb5e4494d45a1df5b1f4b1b5c6d5576322aa72 Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-25tsconsensus: skip integration tests in CIFran Bull1-11/+2
There is an issue to add non-integration tests: #18022 Fixes #15627 #16340 Signed-off-by: Fran Bull <fran@tailscale.com>
2025-11-25tailcfg, control/controlclient: start moving MapResponse.DefaultAutoUpdate ↵Brad Fitzpatrick9-43/+147
to a nodeattr And fix up the TestAutoUpdateDefaults integration tests as they weren't testing reality: the DefaultAutoUpdate is supposed to only be relevant on the first MapResponse in the stream, but the tests weren't testing that. They were instead injecting a 2nd+ MapResponse. This changes the test control server to add a hook to modify the first map response, and then makes the test control when the node goes up and down to make new map responses. Also, the test now runs on macOS where the auto-update feature being disabled would've previously t.Skipped the whole test. Updates #11502 Change-Id: If2319bd1f71e108b57d79fe500b2acedbc76e1a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-25ipn/ipnlocal: test traffic-steering when feature is not enabled (#17997)Simon Law2-0/+23
In PR tailscale/corp#34401, the `traffic-steering` feature flag does not automatically enable traffic steering for all nodes. Instead, an admin must add the `traffic-steering` node attribute to each client node that they want opted-in. For backwards compatibility with older clients, tailscale/corp#34401 strips out the `traffic-steering` node attribute if the feature flag is not enabled, even if it is set in the policy file. This lets us safely disable the feature flag. This PR adds a missing test case for suggested exit nodes that have no priority. Updates tailscale/corp#34399 Signed-off-by: Simon Law <sfllaw@tailscale.com>
2025-11-25ipn/ipnlocal: do not call controlclient.Client.Shutdown with b.mu heldNick Khyl1-7/+10
This fixes a regression in #17804 that caused a deadlock. Updates #18052 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2025-11-25cmd/k8s-operator: allow HA ingresses to be deleted when VIP service does not ↵David Bond3-26/+71
exist (#18050) This commit fixes a bug in our HA ingress reconciler where ingress resources would be stuck in a deleting state should their associated VIP service be deleted within control. The reconciliation loop would check for the existence of the VIP service and if not found perform no additional cleanup steps. The code has been modified to continue onwards even if the VIP service is not found. Fixes: https://github.com/tailscale/tailscale/issues/18049 Signed-off-by: David Bond <davidsbond93@gmail.com>
2025-11-24ipn/ipnlocal: replace log.Printf with logf (#18045)Simon Law2-9/+15
Updates #cleanup Signed-off-by: Simon Law <sfllaw@tailscale.com>
2025-11-24cmd/tailscale,feature/relayserver,ipn: add relay-server-static-endpoints set ↵Jordan Whited8-138/+278
flag Updates tailscale/corp#31489 Updates #17791 Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-11-24net/udprelay: use blake2s-256 MAC for handshake challengeJordan Whited2-57/+227
This commit replaces crypto/rand challenge generation with a blake2s-256 MAC. This enables the peer relay server to respond to multiple forward disco.BindUDPRelayEndpoint messages per handshake generation without sacrificing the proof of IP ownership properties of the handshake. Responding to multiple forward disco.BindUDPRelayEndpoint messages per handshake generation improves client address/path selection where lowest client->server path/addr one-way delay does not necessarily equate to lowest client<->server round trip delay. It also improves situations where outbound traffic is filtered independent of input, and the first reply disco.BindUDPRelayEndpointChallenge message is dropped on the reply path, but a later reply using a different source would make it through. Reduction in serverEndpoint state saves 112 bytes per instance, trading for slightly more expensive crypto ops: 277ns/op vs 321ns/op on an M1 Macbook Pro. Updates tailscale/corp#34414 Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-11-24cmd/cigocacher,go.mod: add cigocacher cmdTom Proctor15-38/+470
Adds cmd/cigocacher as the client to cigocached for Go caching over HTTP. The HTTP cache is best-effort only, and builds will fall back to disk-only cache if it's not available, much like regular builds. Not yet used in CI; that will follow in another PR once we have runners available in this repo with the right network setup for reaching cigocached. Updates tailscale/corp#10808 Change-Id: I13ae1a12450eb2a05bd9843f358474243989e967 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-11-24ipn/ipnlocal: fix panic in driveTransport on network errorAndrew Dunham2-36/+89
When the underlying transport returns a network error, the RoundTrip method returns (nil, error). The defer was attempting to access resp without checking if it was nil first, causing a panic. Fix this by checking for nil in the defer. Also changes driveTransport.tr from *http.Transport to http.RoundTripper and adds a test. Fixes #17306 Signed-off-by: Andrew Dunham <andrew@tailscale.com> Change-Id: Icf38a020b45aaa9cfbc1415d55fd8b70b978f54c
2025-11-23tstest/integration/testcontrol: de-flake TestUserMetricsRouteGaugesAndrew Dunham1-0/+3
SetSubnetRoutes was not sending update notifications to nodes when their approved routes changed, causing nodes to not fetch updated netmaps with PrimaryRoutes populated. This resulted in TestUserMetricsRouteGauges flaking because it waited for PrimaryRoutes to be set, which only happened if the node happened to poll for other reasons. Now send updateSelfChanged notification to affected nodes so they fetch an updated netmap immediately. Fixes #17962 Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2025-11-21portlist,tstest: skip tests on kernels with /proc/net/tcp regressionAndrew Dunham5-1/+112
Linux kernel versions 6.6.102-104 and 6.12.42-45 have a regression in /proc/net/tcp that causes seek operations to fail with "illegal seek". This breaks portlist tests on these kernels. Add kernel version detection for Linux systems and a SkipOnKernelVersions helper to tstest. Use it to skip affected portlist tests on the broken kernel versions. Thanks to philiptaron for the list of kernels with the issue and fix. Updates #16966 Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2025-11-21util/eventbus: use unbounded event queues for DeliveredEvents in subscribersNick Khyl3-39/+16
Bounded DeliveredEvent queues reduce memory usage, but they can deadlock under load. Two common scenarios trigger deadlocks when the number of events published in a short period exceeds twice the queue capacity (there's a PublishedEvent queue of the same size): - a subscriber tries to acquire the same mutex as held by a publisher, or - a subscriber for A events publishes B events Avoiding these scenarios is not practical and would limit eventbus usefulness and reduce its adoption, pushing us back to callbacks and other legacy mechanisms. These deadlocks already occurred in customer devices, dev machines, and tests. They also make it harder to identify and fix slow subscribers and similar issues we have been seeing recently. Choosing an arbitrary large fixed queue capacity would only mask the problem. A client running on a sufficiently large and complex customer environment can exceed any meaningful constant limit, since event volume depends on the number of peers and other factors. Behavior also changes based on scheduling of publishers and subscribers by the Go runtime, OS, and hardware, as the issue is essentially a race between publishers and subscribers. Additionally, on lower-end devices, an unreasonably high constant capacity is practically the same as using unbounded queues. Therefore, this PR changes the event queue implementation to be unbounded by default. The PublishedEvent queue keeps its existing capacity of 16 items, while subscribers' DeliveredEvent queues become unbounded. This change fixes known deadlocks and makes the system stable under load, at the cost of higher potential memory usage, including cases where a queue grows during an event burst and does not shrink when load decreases. Further improvements can be implemented in the future as needed. Fixes #17973 Fixes #18012 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2025-11-21feature/relayserver: don't publish from within a subscribe fn goroutineJordan Whited1-1/+6
Updates #17830 Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-11-21wgengine/userspace: run link change subscribers in eventqueue (#18024)Claus Lensbøl1-1/+7
Updates #17996 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-11-21util/eventbus: add tests for a subscriber publishing eventsNick Khyl1-0/+60
As of 2025-11-20, publishing more events than the eventbus's internal queues can hold may deadlock if a subscriber tries to publish events itself. This commit adds a test that demonstrates this deadlock, and skips it until the bug is fixed. Updates #18012 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2025-11-21util/eventbus: add tests for a subscriber trying to acquire the same mutex ↵Nick Khyl1-0/+70
as a publisher As of 2025-11-20, publishing more events than the eventbus's internal queues can hold may deadlock if a subscriber tries to acquire a mutex that can also be held by a publisher. This commit adds a test that demonstrates this deadlock, and skips it until the bug is fixed. Updates #17973 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2025-11-21tka: don't panic if no clock set in tka.MemAlex Chan1-1/+11
This is causing confusing panics in tailscale/corp#34485. We'll keep using the tka.ChonkMem constructor as much as we can, but don't panic if you create a tka.Mem directly -- we know what the sensible thing is. Updates #cleanup Signed-off-by: Alex Chan <alexc@tailscale.com> Change-Id: I49309f5f403fc26ce4f9a6cf0edc8eddf6a6f3a4
2025-11-20cmd/tailscaled,ipn: show a health warning when state store fails to open ↵Andrew Lytvynov14-4/+211
(#17883) With the introduction of node sealing, store.New fails in some cases due to the TPM device being reset or unavailable. Currently it results in tailscaled crashing at startup, which is not obvious to the user until they check the logs. Instead of crashing tailscaled at startup, start with an in-memory store with a health warning about state initialization and a link to (future) docs on what to do. When this health message is set, also block any login attempts to avoid masking the problem with an ephemeral node registration. Updates #15830 Updates #17654 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2025-11-20go.mod: bump golang.org/x/crypto (#18011)Andrew Lytvynov5-6/+6
Pick up fixes for https://pkg.go.dev/vuln/GO-2025-4134 Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2025-11-20ipn/ipnlocal: add validations when setting serve config (#17950)Harry Harpham5-314/+483
These validations were previously performed in the CLI frontend. There are two motivations for moving these to the local backend: 1. The backend controls synchronization around the relevant state, so only the backend can guarantee many of these validations. 2. Doing these validations in the back-end avoids the need to repeat them across every frontend (e.g. the CLI and tsnet). Updates tailscale/corp#27200 Signed-off-by: Harry Harpham <harry@tailscale.com>
2025-11-20cmd/k8s-operator: add multi replica support for recorders (#17864)David Bond10-151/+380
This commit adds the `spec.replicas` field to the `Recorder` custom resource that allows for a highly available deployment of `tsrecorder` within a kubernetes cluster. Many changes were required here as the code hard-coded the assumption of a single replica. This has required a few loops, similar to what we do for the `Connector` resource to create auth and state secrets. It was also required to add a check to remove dangling state and auth secrets should the recorder be scaled down. Updates: https://github.com/tailscale/tailscale/issues/17965 Signed-off-by: David Bond <davidsbond93@gmail.com>
2025-11-19net/netns: remove spammy logs for interface binding capsJonathan Nobels1-6/+10
fixes tailscale/tailscale#17990 The logging for the netns caps is spammy. Log only on changes to the values and don't log Darwin specific stuff on non Darwin clients. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2025-11-19net/batching: fix import formattingBrad Fitzpatrick1-1/+0
From #17842 Updates #cleanup Change-Id: Ie041b50659361b50558d5ec1f557688d09935f7c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-20cmd/k8s-operator: use stable image for k8s-nameserver (#17985)David Bond2-5/+4
This commit modifies the kubernetes operator to use the "stable" version of `k8s-nameserver` by default. Updates: https://github.com/tailscale/corp/issues/19028 Signed-off-by: David Bond <davidsbond93@gmail.com>
2025-11-19cmd/tailscale/cli: allow remote target as service destination (#17607)KevinLiang1010-44/+221
This commit enables user to set service backend to remote destinations, that can be a partial URL or a full URL. The commit also prevents user to set remote destinations on linux system when socket mark is not working. For user on any version of mac extension they can't serve a service either. The socket mark usability is determined by a new local api. Fixes tailscale/corp#24783 Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
2025-11-19licenses: update license noticesLicense Updater1-0/+1
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2025-11-19ipn/ipnlocal: remove the always-true CanSupportNetworkLock()Alex Chan1-28/+0
Now that we support using an in-memory backend for TKA state (#17946), this function always returns `nil` – we can always support Network Lock. We don't need it any more. Plus, clean up a couple of errant TODOs from that PR. Updates tailscale/corp#33599 Change-Id: Ief93bb9adebb82b9ad1b3e406d1ae9d2fa234877 Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-19util/eventbus: simplify some reflect in Bus.pumpBrad Fitzpatrick1-1/+1
Updates #cleanup Change-Id: Ib7b497e22c6cdd80578c69cf728d45754e6f909e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-19cmd/tailscale/cli: remove Latin abbreviations from CLI help textAlex Chan2-2/+2
Our style guide recommends avoiding Latin abbreviations in technical documentation, which includes the CLI help text. This is causing linter issues for the docs site, because this help text is copied into the docs. See http://go/style-guide/kb/language-and-grammar/abbreviations#latin-abbreviations Updates #cleanup Change-Id: I980c28d996466f0503aaaa65127685f4af608039 Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-19ipn/ipnlocal: reduce profileManager boilerplate in network-lock testsAlex Chan1-83/+33
Updates tailscale/corp#33537 Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-19cmd/k8s-operator: fix type comparison in apiserver proxy template (#17981)Raj Singh1-3/+3
ArgoCD sends boolean values but the template expects strings, causing "incompatible types for comparison" errors. Wrap values with toString so both work. Fixes #17158 Signed-off-by: Raj Singh <raj@tailscale.com>
2025-11-19ipn/ipnlocal, tka: compact TKA state after every syncAlex Chan10-31/+276
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync. Updates https://github.com/tailscale/corp/issues/33537 Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-19cmd/k8s-operator: default to stable image (#17848)David Bond2-3/+3
This commit modifies the helm/static manifest configuration for the k8s-operator to prefer the stable image tag. This avoids making those using static manifests seeing unstable behaviour by default if they do not manually make the change. This is managed for us when using helm but not when generating the static manifests. Updates https://github.com/tailscale/tailscale/issues/10655 Signed-off-by: David Bond <davidsbond93@gmail.com>
2025-11-18feature/featuretags: add CacheNetMap feature tag for upcoming workBrad Fitzpatrick3-0/+30
(trying to get in smaller obvious chunks ahead of later PRs to make them smaller) Updates #17925 Change-Id: I184002001055790484e4792af8ffe2a9a2465b2e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-18tailcfg: add some omitzero, adjust some omitempty to omitzeroBrad Fitzpatrick1-113/+113
Updates tailscale/corp#25406 Change-Id: I7832dbe3dce3774bcc831e3111feb75bcc9e021d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-18cmd/netlogfmt: support resolving IP addresses to synonymous labels (#17955)Joe Tsai1-62/+87
We now embed node information into network flow logs. By default, netlogfmt still prints out using Tailscale IP addresses. Support a "--resolve-addrs=TYPE" flag that can be used to specify resolving IP addresses as node IDs, hostnames, users, or tags. Updates tailscale/corp#33352 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2025-11-18types/key,wgengine/magicsock,control/controlclient,ipn: add debug disco key ↵James Tucker16-37/+375
rotation Adds the ability to rotate discovery keys on running clients, needed for testing upcoming disco key distribution changes. Introduces key.DiscoKey, an atomic container for a disco private key, public key, and the public key's ShortString, replacing the prior separate atomic fields. magicsock.Conn has a new RotateDiscoKey method, and access to this is provided via localapi and a CLI debug command. Note that this implementation is primarily for testing as it stands, and regular use should likely introduce an additional mechanism that allows the old key to be used for some time, to provide a seamless key rotation rather than one that invalidates all sessions. Updates tailscale/corp#34037 Signed-off-by: James Tucker <james@tailscale.com>
2025-11-18appc: add ippool typeFran Bull2-0/+121
As part of the conn25 work we will want to be able to keep track of a pool of IP Addresses and know which have been used and which have not. Fixes tailscale/corp#34247 Signed-off-by: Fran Bull <fran@tailscale.com>
2025-11-18tka: marshal AUMHash totext even if Tailnet Lock is omittedAlex Chan1-7/+18
We use `tka.AUMHash` in `netmap.NetworkMap`, and we serialise it as JSON in the `/debug/netmap` C2N endpoint. If the binary omits Tailnet Lock support, the debug endpoint returns an error because it's unable to marshal the AUMHash. This patch adds a sentinel value so this marshalling works, and we can use the debug endpoint. Updates https://github.com/tailscale/tailscale/issues/17115 Signed-off-by: Alex Chan <alexc@tailscale.com> Change-Id: I51ec1491a74e9b9f49d1766abd89681049e09ce4
2025-11-18tka: mark young AUMs as active even if the chain is longAnton Tolchanov2-10/+44
Existing compaction logic seems to have had an assumption that markActiveChain would cover a longer part of the chain than markYoungAUMs. This prevented long, but fresh, chains, from being compacted correctly. Updates tailscale/corp#33537 Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2025-11-18types/netmap,*: remove some redundant fields from NetMapBrad Fitzpatrick14-42/+59
Updates #12639 Change-Id: Ia50b15529bd1c002cdd2c937cdfbe69c06fa2dc8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-18.github/workflows: make go_generate check detect new filesBrad Fitzpatrick1-0/+1
Updates #17957 Change-Id: I904fd5b544ac3090b58c678c4726e7ace41a52dd Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-18feature/buildfeatures: re-run go generateBrad Fitzpatrick2-6/+6
6a73c0bdf55 added a feature tag but didn't re-run go generate on ./feature/buildfeatures. Updates #9192 Change-Id: I7819450453e6b34c60cad29d2273e3e118291643 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-18cmd/vet/jsontags: fix a typo in an error messageAlex Chan1-2/+2
Updates #17945 Change-Id: I8987271420feb190f5e4d85caff305c8d4e84aae Signed-off-by: Alex Chan <alexc@tailscale.com>