summaryrefslogtreecommitdiffhomepage
path: root/cmd/containerboot
AgeCommit message (Collapse)AuthorFilesLines
2026-04-15cmd/containerboot,cmd/k8s-proxy,kube: add authkey renewal to k8s-proxy (#19221)Tom Meadows3-120/+35
* kube/authkey,cmd/containerboot: extract shared auth key reissue package Move auth key reissue logic (set marker, wait for new key, clear marker, read config) into a shared kube/authkey package and update containerboot to use it. No behaviour change. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * kube/authkey,kube/state,cmd/containerboot: preserve device_id across restarts Stop clearing device_id, device_fqdn, and device_ips from state on startup. These keys are now preserved across restarts so the operator can track device identity. Expand ClearReissueAuthKey to clear device state and tailscaled profile data when performing a full auth key reissue. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/containerboot: use root context for auth key reissue wait Pass the root context instead of bootCtx to setAndWaitForAuthKeyReissue. The 60-second bootCtx timeout was cancelling the reissue wait before the operator had time to respond, causing the pod to crash-loop. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-proxy: add auth key renewal support Add auth key reissue handling to k8s-proxy, mirroring containerboot. When the proxy detects an auth failure (login-state health warning or NeedsLogin state), it disconnects from control, signals the operator via the state Secret, waits for a new key, clears stale state, and exits so Kubernetes restarts the pod with the new key. A health watcher goroutine runs alongside ts.Up() to short-circuit the startup timeout on terminal auth failures. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> --------- Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-13cmd/containerboot: mark TestContainerBoot as flakyBrad Fitzpatrick1-0/+2
Updates #19380 Change-Id: Ib1be53836e37224265d10abd0c2213644ea54d64 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-09cmd/k8s-operator: migrate to tailscale-client-go-v2 (#19010)David Bond1-2/+1
This commit modifies the kubernetes operator to use the `tailscale-client-go-v2` package instead of the internal tailscale client it was previously using. This now gives us the ability to expand out custom resources and features as they become available via the API module. The tailnet reconciler has also been modified to manage clients as tailnets are created and removed, providing each subsequent reconciler with a single `ClientProvider` that obtains a tailscale client for the respective tailnet by name, or the operator's default when presented with a blank string. Fixes: https://github.com/tailscale/corp/issues/38418 Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-07cmd/containerboot: rate-limit IPN bus netmap notificationsDoug Bryant1-3/+3
CPU profiling a containerboot subnet router on a large tailnet showed roughly 40% of CPU spent in serveWatchIPNBus JSON-encoding the full netmap on every update. containerboot only reads SelfNode fields from those notifications (and does a peer lookup when TailnetTargetFQDN is set), so it does not need every intermediate netmap delta. Set ipn.NotifyRateLimit on all three WatchIPNBus calls so netmap notifications are coalesced to one per 3s. Initial-state delivery is unaffected since the rateLimitingBusSender flushes the first send immediately. Updates #cleanup Signed-off-by: Doug Bryant <dougbryant@anthropic.com>
2026-04-05cmd/vet: add subtestnames analyzer; fix all existing violationsBrad Fitzpatrick2-27/+27
Add a new vet analyzer that checks t.Run subtest names don't contain characters requiring quoting when re-running via "go test -run". This enforces the style guide rule: don't use spaces or punctuation in subtest names. The analyzer flags: - Direct t.Run calls with string literal names containing spaces, regex metacharacters, quotes, or other problematic characters - Table-driven t.Run(tt.name, ...) calls where tt ranges over a slice/map literal with bad name field values Also fix all 978 existing violations across 81 test files, replacing spaces with hyphens and shortening long sentence-like names to concise hyphenated forms. Updates #19242 Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-11cmd/{containerboot,k8s-operator}: reissue auth keys for broken proxies (#16450)Tom Proctor4-44/+431
Adds logic for containerboot to signal that it can't auth, so the operator can reissue a new auth key. This only applies when running with a config file and with a kube state store. If the operator sees reissue_authkey in a state Secret, it will create a new auth key iff the config has no auth key or its auth key matches the value of reissue_authkey from the state Secret. This is to ensure we don't reissue auth keys in a tight loop if the proxy is slow to start or failing for some other reason. The reissue logic also uses a burstable rate limiter to ensure there's no way a terminally misconfigured or buggy operator can automatically generate new auth keys in a tight loop. Additional implementation details (ChaosInTheCRD): - Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that `n.Health` is populated when notify's are returned. - on auth failure, containerboot: - Disconnects from control server - Sets reissue_authkey marker in state Secret with the failing key - Polls config file for new auth key (10 minute timeout) - Restarts after receiving new key to apply it - modified operator's reissue logic slightly: - Deletes old device from tailnet before creating new key - Rate limiting: 1 key per 30s with initial burst equal to replica count - In-flight tracking (authKeyReissuing map) prevents duplicate API calls across reconcile loops Updates #14080 Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f (cherry picked from commit 969927c47c3d4de05e90f5b26a6d8d931c5ceed4) Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com> Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-03-06all: use Go 1.26 things, run most gofix modernizersBrad Fitzpatrick2-5/+4
I omitted a lot of the min/max modernizers because they didn't result in more clear code. Some of it's older "for x := range 123". Also: errors.AsType, any, fmt.Appendf, etc. Updates #18682 Change-Id: I83a451577f33877f962766a5b65ce86f7696471c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-05types/ptr: deprecate ptr.To, use Go 1.26 newBrad Fitzpatrick2-15/+13
Updates #18682 Change-Id: I62f6aa0de2a15ef8c1435032c6aa74a181c25f8f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-02-25cmd/containerboot, net/dns/resolver: remove unused funcs in testsBrad Fitzpatrick1-6/+0
staticcheck was complaining about it on a PR I sent: https://github.com/tailscale/tailscale/actions/runs/22408882872/job/64876543467?pr=18804 And: https://github.com/tailscale/tailscale/actions/runs/22408882872/job/64876543475?pr=18804 Updates #cleanup Updates #18157 Change-Id: I6225481f3aab9e43ef1920aa1a12e86c5073a638 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-02-20cmd/containerboot,kube: enable autoadvertisement of Tailscale services on ↵Tom Meadows6-65/+269
containerboot (#18527) * cmd/containerboot,kube/services: support the ability to automatically advertise services on startup Updates #17769 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/containerboot: don't assume we want to use kube state store if in kubernetes Fixes #8188 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> --------- Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-02-10cmd/containerboot: fix error handling for egress (#18657)BeckyPauley1-1/+2
Fixes #18631 Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-02-03cmd/containerboot: handle v6 pod ips that are missing square brackets (#18519)David Bond2-0/+40
This commit fixes an issue within containerboot that arose from the kubernetes operator. When users enable metrics on custom resources that are running on dual stack or ipv6 only clusters, they end up with an error as we pass the hostport combintation using $(POD_IP):PORT. In go, `netip.ParseAddrPort` expects square brackets `[]` to wrap the host portion of an ipv6 address and would naturally, crash. When loading the containerboot configuration from the environment we now check if the `TS_LOCAL_ADDR_PORT` value contains the pod's v6 ip address. If it does & does not already contain brackets, we add the brackets in. Closes: #15762 Closes: #15467 Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-01-23all: remove AUTHORS file and references to itWill Norris14-14/+14
This file was never truly necessary and has never actually been used in the history of Tailscale's open source releases. A Brief History of AUTHORS files --- The AUTHORS file was a pattern developed at Google, originally for Chromium, then adopted by Go and a bunch of other projects. The problem was that Chromium originally had a copyright line only recognizing Google as the copyright holder. Because Google (and most open source projects) do not require copyright assignemnt for contributions, each contributor maintains their copyright. Some large corporate contributors then tried to add their own name to the copyright line in the LICENSE file or in file headers. This quickly becomes unwieldy, and puts a tremendous burden on anyone building on top of Chromium, since the license requires that they keep all copyright lines intact. The compromise was to create an AUTHORS file that would list all of the copyright holders. The LICENSE file and source file headers would then include that list by reference, listing the copyright holder as "The Chromium Authors". This also become cumbersome to simply keep the file up to date with a high rate of new contributors. Plus it's not always obvious who the copyright holder is. Sometimes it is the individual making the contribution, but many times it may be their employer. There is no way for the proejct maintainer to know. Eventually, Google changed their policy to no longer recommend trying to keep the AUTHORS file up to date proactively, and instead to only add to it when requested: https://opensource.google/docs/releasing/authors. They are also clear that: > Adding contributors to the AUTHORS file is entirely within the > project's discretion and has no implications for copyright ownership. It was primarily added to appease a small number of large contributors that insisted that they be recognized as copyright holders (which was entirely their right to do). But it's not truly necessary, and not even the most accurate way of identifying contributors and/or copyright holders. In practice, we've never added anyone to our AUTHORS file. It only lists Tailscale, so it's not really serving any purpose. It also causes confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header in other open source repos which don't actually have an AUTHORS file, so it's ambiguous what that means. Instead, we just acknowledge that the contributors to Tailscale (whoever they are) are copyright holders for their individual contributions. We also have the benefit of using the DCO (developercertificate.org) which provides some additional certification of their right to make the contribution. The source file changes were purely mechanical with: git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g' Updates #cleanup Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d Signed-off-by: Will Norris <will@tailscale.com>
2026-01-14cmd/containerboot: allow for automatic ID token generationMario Minardi4-8/+79
Allow for optionally specifying an audience for containerboot. This is passed to tailscale up to allow for containerboot to use automatic ID token generation for authentication. Updates https://github.com/tailscale/corp/issues/34430 Signed-off-by: Mario Minardi <mario@tailscale.com>
2026-01-07cmd/containerboot: add OAuth and WIF auth support (#18311)Raj Singh4-10/+131
Fixes tailscale/corp#34430 Signed-off-by: Raj Singh <raj@tailscale.com>
2025-12-18cmd/containerboot: support egress to Tailscale Service FQDNs (#17493)Tom Proctor3-55/+123
Adds support for targeting FQDNs that are a Tailscale Service. Uses the same method of searching for Services as the tailscale configure kubeconfig command. This fixes using the tailscale.com/tailnet-fqdn annotation for Kubernetes Service when the specified FQDN is a Tailscale Service. Fixes #16534 Change-Id: I422795de76dc83ae30e7e757bc4fbd8eec21cc64 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com> Signed-off-by: Becky Pauley <becky@tailscale.com>
2025-11-18all: rename variables with lowercase-l/uppercase-IAlex Chan1-20/+20
See http://go/no-ell Signed-off-by: Alex Chan <alexc@tailscale.com> Updates #cleanup Change-Id: I8c976b51ce7a60f06315048b1920516129cc1d5d
2025-09-28util/backoff: rename logtail/backoff package to util/backoffBrad Fitzpatrick1-1/+1
It has nothing to do with logtail and is confusing named like that. Updates #cleanup Updates #17323 Change-Id: Idd34587ba186a2416725f72ffc4c5778b0b9db4a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-19util/eventbus/eventbustest: fix typo of test nameBrad Fitzpatrick1-1/+1
And another case of the same typo in a comment elsewhere. Updates #cleanup Change-Id: Iaa9d865a1cf83318d4a30263c691451b5d708c9c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-04cmd/containerboot: do not reset state on non-existant secret (#17021)David Bond1-2/+5
This commit modifies containerboot's state reset process to handle the state secret not existing. During other parts of the boot process we gracefully handle the state secret not being created yet, but missed that check within `resetContainerbootState` Fixes https://github.com/tailscale/tailscale/issues/16804 Signed-off-by: David Bond <davidsbond93@gmail.com>
2025-08-28syncs: delete WaitGroup and use sync.WaitGroup.Go in Go 1.25Joe Tsai1-3/+2
Our own WaitGroup wrapper type was a prototype implementation for the Go method on the standard sync.WaitGroup type. Now that there is first-class support for Go, we should migrate over to using it and delete syncs.WaitGroup. Updates #cleanup Updates tailscale/tailscale#16330 Change-Id: Ib52b10f9847341ce29b4ca0da927dc9321691235 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2025-07-22cmd/{k8s-proxy,containerboot,k8s-operator},kube: add health check and ↵David Bond3-143/+9
metrics endpoints for k8s-proxy (#16540) * Modifies the k8s-proxy to expose health check and metrics endpoints on the Pod's IP. * Moves cmd/containerboot/healthz.go and cmd/containerboot/metrics.go to /kube to be shared with /k8s-proxy. Updates #13358 Signed-off-by: David Bond <davidsbond93@gmail.com>
2025-07-21all-kube: create Tailscale Service for HA kube-apiserver ProxyGroup (#16572)Tom Proctor5-454/+7
Adds a new reconciler for ProxyGroups of type kube-apiserver that will provision a Tailscale Service for each replica to advertise. Adds two new condition types to the ProxyGroup, TailscaleServiceValid and TailscaleServiceConfigured, to post updates on the state of that reconciler in a way that's consistent with the service-pg reconciler. The created Tailscale Service name is configurable via a new ProxyGroup field spec.kubeAPISserver.ServiceName, which expects a string of the form "svc:<dns-label>". Lots of supporting changes were needed to implement this in a way that's consistent with other operator workflows, including: * Pulled containerboot's ensureServicesUnadvertised and certManager into kube/ libraries to be shared with k8s-proxy. Use those in k8s-proxy to aid Service cert sharing between replicas and graceful Service shutdown. * For certManager, add an initial wait to the cert loop to wait until the domain appears in the devices's netmap to avoid a guaranteed error on the first issue attempt when it's quick to start. * Made several methods in ingress-for-pg.go and svc-for-pg.go into functions to share with the new reconciler * Added a Resource struct to the owner refs stored in Tailscale Service annotations to be able to distinguish between Ingress- and ProxyGroup- based Services that need cleaning up in the Tailscale API. * Added a ListVIPServices method to the internal tailscale client to aid cleaning up orphaned Services * Support for reading config from a kube Secret, and partial support for config reloading, to prevent us having to force Pod restarts when config changes. * Fixed up the zap logger so it's possible to set debug log level. Updates #13358 Change-Id: Ia9607441157dd91fb9b6ecbc318eecbef446e116 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-06-27cmd/{containerboot,k8s-operator}: use state Secret for checking device auth ↵Tom Proctor4-84/+189
(#16328) Previously, the operator checked the ProxyGroup status fields for information on how many of the proxies had successfully authed. Use their state Secrets instead as a more reliable source of truth. containerboot has written device_fqdn and device_ips keys to the state Secret since inception, and pod_uid since 1.78.0, so there's no need to use the API for that data. Read it from the state Secret for consistency. However, to ensure we don't read data from a previous run of containerboot, make sure we reset containerboot's state keys on startup. One other knock-on effect of that is ProxyGroups can briefly be marked not Ready while a Pod is restarting. Introduce a new ProxyGroupAvailable condition to more accurately reflect when downstream controllers can implement flows that rely on a ProxyGroup having at least 1 proxy Pod running. Fixes #16327 Change-Id: I026c18e9d23e87109a471a87b8e4fb6271716a66 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-05-30cmd/containerboot: allow setting --accept-dns via TS_EXTRA_ARGS again (#16129)Irbe Krumina3-91/+322
In 1.84 we made 'tailscale set'/'tailscale up' error out if duplicate command line flags are passed. This broke some container configurations as we have two env vars that can be used to set --accept-dns flag: - TS_ACCEPT_DNS- specifically for --accept-dns - TS_EXTRA_ARGS- accepts any arbitrary 'tailscale up'/'tailscale set' flag. We default TS_ACCEPT_DNS to false (to make the container behaviour more declarative), which with the new restrictive CLI behaviour resulted in failure for users who had set --accept-dns via TS_EXTRA_ARGS as the flag would be provided twice. This PR re-instates the previous behaviour by checking if TS_EXTRA_ARGS contains --accept-dns flag and if so using its value to override TS_ACCEPT_DNS. Updates tailscale/tailscale#16108 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-05-19cmd/containerboot,kube/ingressservices: proxy VIPService TCP/UDP traffic to ↵Irbe Krumina8-793/+1392
cluster Services (#15897) cmd/containerboot,kube/ingressservices: proxy VIPService TCP/UDP traffic to cluster Services This PR is part of the work to implement HA for Kubernetes Operator's network layer proxy. Adds logic to containerboot to monitor mounted ingress firewall configuration rules and update iptables/nftables rules as the config changes. Also adds new shared types for the ingress configuration. The implementation is intentionally similar to that for HA for egress proxy. Updates tailscale/tailscale#15895 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-04-09cmd/{containerboot,k8s-operator},kube/kubetypes: unadvertise ingress ↵Tom Proctor4-17/+65
services on shutdown (#15451) Ensure no services are advertised as part of shutting down tailscaled. Prefs are only edited if services are currently advertised, and they're edited we wait for control's ~15s (+ buffer) delay to failover. Note that editing prefs will trigger a synchronous write to the state Secret, so it may fail to persist state if the ProxyGroup is getting scaled down and therefore has its RBAC deleted at the same time, but that failure doesn't stop prefs being updated within the local backend, doesn't affect connectivity to control, and the state Secret is about to get deleted anyway, so the only negative side effect is a harmless error log during shutdown. Control still learns that the node is no longer advertising the service and triggers the failover. Note that the first version of this used a PreStop lifecycle hook, but that only supports GET methods and we need the shutdown to trigger side effects (updating prefs) so it didn't seem appropriate to expose that functionality on a GET endpoint that's accessible on the k8s network. Updates tailscale/corp#24795 Change-Id: I0a9a4fe7a5395ca76135ceead05cbc3ee32b3d3c Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-04-08cmd/containerboot: speed up tests (#14883)Tom Proctor3-762/+812
The test suite had grown to about 20s on my machine, but it doesn't do much taxing work so was a good candidate to parallelise. Now runs in under 2s on my machine. Updates #cleanup Change-Id: I2fcc6be9ca226c74c0cb6c906778846e959492e4 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-03-26cmd/{k8s-operator,containerboot}: check TLS cert before advertising ↵Irbe Krumina1-1/+10
VIPService (#15427) cmd/{k8s-operator,containerboot}: check TLS cert before advertising VIPService - Ensures that Ingress status does not advertise port 443 before TLS cert has been issued - Ensure that Ingress backends do not advertise a VIPService before TLS cert has been issued, unless the service also exposes port 80 Updates tailscale/corp#24795 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-03-14cmd/containerboot: manage HA Ingress TLS certs from containerboot (#15303)Irbe Krumina7-5/+419
cmd/containerboot: manage HA Ingress TLS certs from containerboot When ran as HA Ingress node, containerboot now can determine whether it should manage TLS certs for the HA Ingress replicas and call the LocalAPI cert endpoint to ensure initial issuance and renewal of the shared TLS certs. Updates tailscale/corp#24795 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-02-27cmd/containerboot: fix nil pointer exception (#15090)Irbe Krumina2-4/+11
Updates tailscale/tailscale#15081 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-02-05all: use new LocalAPI client package locationBrad Fitzpatrick5-15/+15
It was moved in f57fa3cbc30e. Updates tailscale/corp#22748 Change-Id: I19f965e6bded1d4c919310aa5b864f2de0cd6220 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-01-30cmd/containerboot: wait for consistent state on shutdown (#14263)Tom Proctor5-75/+294
tailscaled's ipn package writes a collection of keys to state after authenticating to control, but one at a time. If containerboot happens to send a SIGTERM signal to tailscaled in the middle of writing those keys, it may shut down with an inconsistent state Secret and never recover. While we can't durably fix this with our current single-use auth keys (no atomic operation to auth + write state), we can reduce the window for this race condition by checking for partial state before sending SIGTERM to tailscaled. Best effort only. Updates #14080 Change-Id: I0532d51b6f0b7d391e538468bd6a0a80dbe1d9f7 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2025-01-29cmd/{k8s-operator,containerboot},kube: ensure egress ProxyGroup proxies ↵Irbe Krumina6-73/+484
don't terminate while cluster traffic is still routed to them (#14436) cmd/{containerboot,k8s-operator},kube: add preshutdown hook for egress PG proxies This change is part of work towards minimizing downtime during update rollouts of egress ProxyGroup replicas. This change: - updates the containerboot health check logic to return Pod IP in headers, if set - always runs the health check for egress PG proxies - updates ClusterIP Services created for PG egress endpoints to include the health check endpoint - implements preshutdown endpoint in proxies. The preshutdown endpoint logic waits till, for all currently configured egress services, the ClusterIP Service health check endpoint is no longer returned by the shutting-down Pod (by looking at the new Pod IP header). - ensures that kubelet is configured to call the preshutdown endpoint This reduces the possibility that, as replicas are terminated during an update, a replica gets terminated to which cluster traffic is still being routed via the ClusterIP Service because kube proxy has not yet updated routig rules. This is not a perfect check as in practice, it only checks that the kube proxy on the node on which the proxy runs has updated rules. However, overall this might be good enough. The preshutdown logic is disabled if users have configured a custom health check port via TS_LOCAL_ADDR_PORT env var. This change throws a warnign if so and in future setting of that env var for operator proxies might be disallowed (as users shouldn't need to configure this for a Pod directly). This is backwards compatible with earlier proxy versions. Updates tailscale/tailscale#14326 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-01-21cmd/{k8s-operator,containerboot},kube/kubetypes: parse Ingresses for ingress ↵Irbe Krumina1-0/+7
ProxyGroup (#14583) cmd/k8s-operator: add logic to parse L7 Ingresses in HA mode - Wrap the Tailscale API client used by the Kubernetes Operator into a client that knows how to manage VIPServices. - Create/Delete VIPServices and update serve config for L7 Ingresses for ProxyGroup. - Ensure that ingress ProxyGroup proxies mount serve config from a shared ConfigMap. Updates tailscale/corp#24795 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-01-10cmd/containerboot: load containerboot serve config that does not contain ↵Irbe Krumina2-11/+290
HTTPS endpoint in tailnets with HTTPS disabled (#14538) cmd/containerboot: load containerboot serve config that does not contain HTTPS endpoint in tailnets with HTTPS disabled Fixes an issue where, if a tailnet has HTTPS disabled, no serve config set via TS_SERVE_CONFIG was loaded, even if it does not contain an HTTPS endpoint. Now for tailnets with HTTPS disabled serve config provided to containerboot is considered invalid (and therefore not loaded) only if there is an HTTPS endpoint defined in the config. Fixes tailscale/tailscale#14495 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2025-01-10cmd/containerboot,cmd/k8s-operator: reload tailscaled config (#14342)Irbe Krumina2-0/+78
cmd/{k8s-operator,containerboot}: reload tailscaled configfile when its contents have changed Instead of restarting the Kubernetes Operator proxies each time tailscaled config has changed, this dynamically reloads the configfile using the new reload endpoint. Older annotation based mechanism will be supported till 1.84 to ensure that proxy versions prior to 1.80 keep working with operator 1.80 and newer. Updates tailscale/tailscale#13032 Updates tailscale/corp#24795 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11cmd/containerboot: don't attempt to patch a Secret field without permissions ↵Irbe Krumina3-1/+3
(#14365) Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11cmd/containerboot: add more tests, check that egress service config only set ↵Irbe Krumina2-8/+119
on kube (#14360) Updates tailscale/tailscale#14357 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11cmd/containerboot: don't attempt to write kube Secret in non-kube ↵Irbe Krumina1-2/+4
environments (#14358) Updates tailscale/tailscale#14354 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11cmd/containerboot: guard kubeClient against nil dereference (#14357)Bjorn Neergaard1-2/+4
A method on kc was called unconditionally, even if was not initialized, leading to a nil pointer dereference when TS_SERVE_CONFIG was set outside Kubernetes. Add a guard symmetric with other uses of the kubeClient. Fixes #14354. Signed-off-by: Bjorn Neergaard <bjorn@neersighted.com>
2024-12-04cmd/{containerboot,k8s-operator},kube/kubetypes: kube Ingress L7 proxies ↵Irbe Krumina6-89/+194
only advertise HTTPS endpoint when ready (#14171) cmd/containerboot,kube/kubetypes,cmd/k8s-operator: detect if Ingress is created in a tailnet that has no HTTPS This attempts to make Kubernetes Operator L7 Ingress setup failures more explicit: - the Ingress resource now only advertises HTTPS endpoint via status.ingress.loadBalancer.hostname when/if the proxy has succesfully loaded serve config - the proxy attempts to catch cases where HTTPS is disabled for the tailnet and logs a warning Updates tailscale/tailscale#12079 Updates tailscale/tailscale#10407 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-02cmd/containerboot: serve health on local endpoint (#14246)Tom Proctor5-63/+248
* cmd/containerboot: serve health on local endpoint We introduced stable (user) metrics in #14035, and `TS_LOCAL_ADDR_PORT` with it. Rather than requiring users to specify a new addr/port combination for each new local endpoint they want the container to serve, this combines the health check endpoint onto the local addr/port used by metrics if `TS_ENABLE_HEALTH_CHECK` is used instead of `TS_HEALTHCHECK_ADDR_PORT`. `TS_LOCAL_ADDR_PORT` now defaults to binding to all interfaces on 9002 so that it works more seamlessly and with less configuration in environments other than Kubernetes, where the operator always overrides the default anyway. In particular, listening on localhost would not be accessible from outside the container, and many scripted container environments do not know the IP address of the container before it's started. Listening on all interfaces allows users to just set one env var (`TS_ENABLE_METRICS` or `TS_ENABLE_HEALTH_CHECK`) to get a fully functioning local endpoint they can query from outside the container. Updates #14035, #12898 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-11-23cmd/containerboot: preserve headers of metrics endpoints responses (#14204)Irbe Krumina1-1/+1
Updates tailscale/tailscale#11292 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-22cmd/{containerboot,k8s-operator},k8s-operator: new options to expose user ↵Tom Proctor5-2/+127
metrics (#14035) containerboot: Adds 3 new environment variables for containerboot, `TS_LOCAL_ADDR_PORT` (default `"${POD_IP}:9002"`), `TS_METRICS_ENABLED` (default `false`), and `TS_DEBUG_ADDR_PORT` (default `""`), to configure metrics and debug endpoints. In a follow-up PR, the health check endpoint will be updated to use the `TS_LOCAL_ADDR_PORT` if `TS_HEALTHCHECK_ADDR_PORT` hasn't been set. Users previously only had access to internal debug metrics (which are unstable and not recommended) via passing the `--debug` flag to tailscaled, but can now set `TS_METRICS_ENABLED=true` to expose the stable metrics documented at https://tailscale.com/kb/1482/client-metrics at `/metrics` on the addr/port specified by `TS_LOCAL_ADDR_PORT`. Users can also now configure a debug endpoint more directly via the `TS_DEBUG_ADDR_PORT` environment variable. This is not recommended for production use, but exposes an internal set of debug metrics and pprof endpoints. operator: The `ProxyClass` CRD's `.spec.metrics.enable` field now enables serving the stable user metrics documented at https://tailscale.com/kb/1482/client-metrics at `/metrics` on the same "metrics" container port that debug metrics were previously served on. To smooth the transition for anyone relying on the way the operator previously consumed this field, we also _temporarily_ serve tailscaled's internal debug metrics on the same `/debug/metrics` path as before, until 1.82.0 when debug metrics will be turned off by default even if `.spec.metrics.enable` is set. At that point, anyone who wishes to continue using the internal debug metrics (not recommended) will need to set the new `ProxyClass` field `.spec.statefulSet.pod.tailscaleContainer.debug.enable`. Users who wish to opt out of the transitional behaviour, where enabling `.spec.metrics.enable` also enables debug metrics, can set `.spec.statefulSet.pod.tailscaleContainer.debug.enable` to false (recommended). Separately but related, the operator will no longer specify a host port for the "metrics" container port definition. This caused scheduling conflicts when k8s needs to schedule more than one proxy per node, and was not necessary for allowing the pod's port to be exposed to prometheus scrapers. Updates #11292 --------- Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com> Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-11-19kube/{kubeapi,kubeclient},ipn/store/kubestore,cmd/{containerboot,k8s-operato ↵Irbe Krumina2-3/+3
r}: emit kube store Events (#14112) Adds functionality to kube client to emit Events. Updates kube store to emit Events when tailscaled state has been loaded, updated or if any errors where encountered during those operations. This should help in cases where an error related to state loading/updating caused the Pod to crash in a loop- unlike logs of the originally failed container instance, Events associated with the Pod will still be accessible even after N restarts. Updates tailscale/tailscale#14080 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-12cmd/{k8s-operator,containerboot},k8s-operator: remove support for proxies ↵Irbe Krumina1-5/+4
below capver 95. (#13986) Updates tailscale/tailscale#13984 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-10-08cmd/{k8s-operator,containerboot},kube/egressservices: fix Pod IP check for ↵Irbe Krumina3-35/+65
dual stack clusters (#13721) Currently egress Services for ProxyGroup only work for Pods and Services with IPv4 addresses. Ensure that it works on dual stack clusters by reading proxy Pod's IP from the .status.podIPs list that always contains both IPv4 and IPv6 address (if the Pod has them) rather than .status.podIP that could contain IPv6 only for a dual stack cluster. Updates tailscale/tailscale#13406 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-10-08cmd/containerboot: simplify k8s setup logic (#13627)Tom Proctor1-29/+36
Rearrange conditionals to reduce indentation and make it a bit easier to read the logic. Also makes some error message updates for better consistency with the recent decision around capitalising resource names and the upcoming addition of config secrets. Updates #cleanup Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-10-07cmd/{containerboot,k8s-operator},k8s-operator,kube: add ProxyGroup ↵Tom Proctor1-1/+1
controller (#13684) Implements the controller for the new ProxyGroup CRD, designed for running proxies in a high availability configuration. Each proxy gets its own config and state Secret, and its own tailscale node ID. We are currently mounting all of the config secrets into the container, but will stop mounting them and instead read them directly from the kube API once #13578 is implemented. Updates #13406 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>