summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)AuthorFilesLines
2024-12-06Finish up the fix, automated testtomhjp/consistent-state-testTom Proctor3-176/+162
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-05local interactive test codeTom Proctor1-13/+124
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-05cmd/containerboot: wait for consistent state on shutdownTom Proctor3-0/+117
tailscaled's ipn package writes a collection of keys to state after authenticating to control, but one at a time. If containerboot happens to send a SIGTERM signal to tailscaled in the middle of writing those keys, it may shut down with an inconsistent state Secret and never recover. While we can't durably fix this with our current single-use auth keys (no atomic operation to auth + write state), we can reduce the window for this race condition by checking for partial state before sending SIGTERM to tailscaled. Best effort only. Updates #14080 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-05cmd/k8s-operator: don't error for transient failures (#14073)Tom Proctor8-17/+84
Every so often, the ProxyGroup and other controllers lose an optimistic locking race with other controllers that update the objects they create. Stop treating this as an error event, and instead just log an info level log line for it. Fixes #14072 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-04cmd/tailscale,net/netcheck: add debug feature to force preferred DERPJames Tucker7-1/+140
This provides an interface for a user to force a preferred DERP outcome for all future netchecks that will take precedence unless the forced region is unreachable. The option does not persist and will be lost when the daemon restarts. Updates tailscale/corp#18997 Updates tailscale/corp#24755 Signed-off-by: James Tucker <james@tailscale.com>
2024-12-04net/tstun: remove tailscaled_outbound_dropped_packets_total reason=acl ↵Brad Fitzpatrick2-4/+5
metric for now Updates #14280 Change-Id: Idff102b3d7650fc9dfbe0c340168806bdf542d76 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-04cmd/{containerboot,k8s-operator},kube/kubetypes: kube Ingress L7 proxies ↵Irbe Krumina12-128/+443
only advertise HTTPS endpoint when ready (#14171) cmd/containerboot,kube/kubetypes,cmd/k8s-operator: detect if Ingress is created in a tailnet that has no HTTPS This attempts to make Kubernetes Operator L7 Ingress setup failures more explicit: - the Ingress resource now only advertises HTTPS endpoint via status.ingress.loadBalancer.hostname when/if the proxy has succesfully loaded serve config - the proxy attempts to catch cases where HTTPS is disabled for the tailnet and logs a warning Updates tailscale/tailscale#12079 Updates tailscale/tailscale#10407 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-04cmd/k8s-operator: fix a bunch of status equality checks (#14270)Irbe Krumina8-15/+15
Updates tailscale/tailscale#14269 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-03cmd/k8s-operator/deploy/chart: allow reading OAuth creds from a CSI driver's ↵Oliver Rahner3-4/+30
volume and annotating operator's Service account (#14264) cmd/k8s-operator/deploy/chart: allow reading OAuth creds from a CSI driver's volume and annotating operator's Service account Updates #14264 Signed-off-by: Oliver Rahner <o.rahner@dke-data.com>
2024-12-03cmd/k8s-operator: avoid port collision with metrics endpoint (#14185)Tom Proctor1-7/+7
When the operator enables metrics on a proxy, it uses the port 9001, and in the near future it will start using 9002 for the debug endpoint as well. Make sure we don't choose ports from a range that includes 9001 so that we never clash. Setting TS_SOCKS5_SERVER, TS_HEALTHCHECK_ADDR_PORT, TS_OUTBOUND_HTTP_PROXY_LISTEN, and PORT could also open arbitrary ports, so we will need to document that users should not choose ports from the 10000-11000 range for those settings. Updates #13406 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-03cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor (#14248)Irbe Krumina21-22/+877
* cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor Adds a new spec.metrics.serviceMonitor field to ProxyClass. If that's set to true (and metrics are enabled), the operator will create a Prometheus ServiceMonitor for each proxy to which the ProxyClass applies. Additionally, create a metrics Service for each proxy that has metrics enabled. Updates tailscale/tailscale#11292 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-03cmd/k8s-operator,docs/k8s: run tun mode proxies in privileged containers ↵Irbe Krumina9-41/+36
(#14262) We were previously relying on unintended behaviour by runc where all containers where by default given read/write/mknod permissions for tun devices. This behaviour was removed in https://github.com/opencontainers/runc/pull/3468 and released in runc 1.2. Containerd container runtime, used by Docker and majority of Kubernetes distributions bumped runc to 1.2 in 1.7.24 https://github.com/containerd/containerd/releases/tag/v1.7.24 thus breaking our reference tun mode Tailscale Kubernetes manifests and Kubernetes operator proxies. This PR changes the all Kubernetes container configs that run Tailscale in tun mode to privileged. This should not be a breaking change because all these containers would run in a Pod that already has a privileged init container. Updates tailscale/tailscale#14256 Updates tailscale/tailscale#10814 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-02IPN: Update ServeConfig to accept configuration for Services.KevinLiang104-2/+144
This commit updates ServeConfig to allow configuration to Services (VIPServices for now) via Serve. The scope of this commit is only adding the Services field to ServeConfig. The field doesn't actually allow packet flowing yet. The purpose of this commit is to unblock other work on k8s end. Updates #22953 Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
2024-12-02net/netcheck: clean up ICMP probe AddrPort lookupBrad Fitzpatrick2-29/+36
Fixes #14200 Change-Id: Ib086814cf63dda5de021403fe1db4fb2a798eaae Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-02cmd/containerboot: serve health on local endpoint (#14246)Tom Proctor7-66/+251
* cmd/containerboot: serve health on local endpoint We introduced stable (user) metrics in #14035, and `TS_LOCAL_ADDR_PORT` with it. Rather than requiring users to specify a new addr/port combination for each new local endpoint they want the container to serve, this combines the health check endpoint onto the local addr/port used by metrics if `TS_ENABLE_HEALTH_CHECK` is used instead of `TS_HEALTHCHECK_ADDR_PORT`. `TS_LOCAL_ADDR_PORT` now defaults to binding to all interfaces on 9002 so that it works more seamlessly and with less configuration in environments other than Kubernetes, where the operator always overrides the default anyway. In particular, listening on localhost would not be accessible from outside the container, and many scripted container environments do not know the IP address of the container before it's started. Listening on all interfaces allows users to just set one env var (`TS_ENABLE_METRICS` or `TS_ENABLE_HEALTH_CHECK`) to get a fully functioning local endpoint they can query from outside the container. Updates #14035, #12898 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-02cmd/checkmetrics: add command for checking metrics against kbBrad Fitzpatrick2-0/+142
This commit adds a command to validate that all the metrics that are registring in the client are also present in a path or url. It is intended to be ran from the KB against the latest version of tailscale. Updates tailscale/corp#24066 Updates tailscale/corp#22075 Co-Authored-By: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-29cmd/k8s-operator: always set stateful filtering to false (#14216)Irbe Krumina3-22/+11
Updates tailscale/tailscale#12108 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-29Makefile,./build_docker.sh: update kube operator image build target name ↵Irbe Krumina2-2/+2
(#14251) Updates tailscale/corp#24540 Updates tailscale/tailscale#12914 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-29cmd/k8s-operator: fix port name change bug for egress ProxyGroup proxies ↵Irbe Krumina3-24/+77
(#14247) Ensure that the ExternalName Service port names are always synced to the ClusterIP Service, to fix a bug where if users created a Service with a single unnamed port and later changed to 1+ named ports, the operator attempted to apply an invalid multi-port Service with an unnamed port. Also, fixes a small internal issue where not-yet Service status conditons were lost on a spec update. Updates tailscale/tailscale#10102 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-28tsnet: remove flaky test marker from metricsKristoffer Dalby1-4/+4
Updates #13420 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28wgengine/magicsock: packet/bytes metrics should not count discoKristoffer Dalby1-3/+3
Updates #13420 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28tsnet: validate sent data in metrics testKristoffer Dalby1-7/+13
Updates #13420 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28tsnet: split bytes and routes metrics testsKristoffer Dalby1-61/+123
Updates #13420 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28tsnet: send less data in metrics integration testKristoffer Dalby1-8/+6
this commit reduced the amount of data sent in the metrics data integration test from 10MB to 1MB. On various machines 10MB was quite flaky, while 1MB has not failed once on 10000 runs. Updates #13420 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28health: move health metrics test to health_testKristoffer Dalby3-33/+50
Updates #13420 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-27logtail: avoid bytes.Buffer allocation (#11858)Joe Tsai1-2/+10
Re-use a pre-allocated bytes.Buffer struct and shallow the copy the result of bytes.NewBuffer into it to avoid allocating the struct. Note that we're only reusing the bytes.Buffer struct itself and not the underling []byte temporarily stored within it. Updates #cleanup Updates tailscale/corp#18514 Updates golang/go#67004 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-11-27ipn/localapi: count localapi requests to metric endpointsAnton Tolchanov1-1/+5
Updates tailscale/corp#22075 Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-11-26control/controlhttp: set *health.Tracker in testsAndrew Dunham1-0/+3
Observed during another PR: https://github.com/tailscale/tailscale/actions/runs/12040045880/job/33569141807 Updates #cleanup Signed-off-by: Andrew Dunham <andrew@du.nham.ca> Change-Id: I9e0f49a35485fa2e097892737e5e3c95bf775a90
2024-11-26cmd/tailscale/cli: fix format stringNick Khyl1-2/+2
Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-26ipn/ipnlocal: only check CanUseExitNode if we are attempting to use one (#14230)Mario Minardi1-1/+6
In https://github.com/tailscale/tailscale/pull/13726 we added logic to `checkExitNodePrefsLocked` to error out on platforms where using an exit node is unsupported in order to give users more obvious feedback than having this silently fail downstream. The above change neglected to properly check whether the device in question was actually trying to use an exit node when doing the check and was incorrectly returning an error on any calls to `checkExitNodePrefsLocked` on platforms where using an exit node is not supported as a result. This change remedies this by adding a check to see whether the device is attempting to use an exit node before doing the `CanUseExitNode` check. Updates https://github.com/tailscale/corp/issues/24835 Signed-off-by: Mario Minardi <mario@tailscale.com>
2024-11-25net/netmon: improve panic reporting from #14202James Tucker1-2/+5
I was hoping we'd catch an example input quickly, but the reporter had rebooted their machine and it is no longer exhibiting the behavior. As such this code may be sticking around quite a bit longer and we might encounter other errors, so include the panic in the log entry. Updates #14201 Updates #14202 Updates golang/go#70528 Signed-off-by: James Tucker <james@tailscale.com>
2024-11-25docs/windows/policy: update ADMX policy definitions to reflect the syspolicy ↵Nick Khyl2-51/+91
settings We add a policy definition for the AllowedSuggestedExitNodes syspolicy setting, allowing admins to configure a list of exit node IDs to be used as a pool for automatic suggested exit node selection. We update definitions for policy settings configurable on both a per-user and per-machine basis, such as UI customizations, to specify class="Both". Lastly, we update the help text for existing policy definitions to include a link to the KB article as the last line instead of in the first paragraph. Updates #12687 Updates tailscale/corp#19681 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-23cmd/containerboot: preserve headers of metrics endpoints responses (#14204)Irbe Krumina1-1/+1
Updates tailscale/tailscale#11292 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-22net/netmon: catch ParseRIB panic to gather buffer dataJames Tucker1-1/+9
Updates #14201 Updates golang/go#70528 Signed-off-by: James Tucker <james@tailscale.com>
2024-11-22ipn/ipnlocal: rebuild allowed suggested exit nodes when syspolicy changesNick Khyl1-5/+38
In this PR, we update LocalBackend to rebuild the set of allowed suggested exit nodes whenever the AllowedSuggestedExitNodes syspolicy setting changes. Additionally, we request a new suggested exit node when this occurs, enabling its use if the ExitNodeID syspolicy setting is set to auto:any. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22control/controlclient: use the most recent ↵Nick Khyl1-11/+2
syspolicy.MachineCertificateSubject value This PR removes the sync.Once wrapper around retrieving the MachineCertificateSubject policy setting value, ensuring the most recent version is always used if it changes after the service starts. Although this policy setting is used by a very limited number of customers, recent support escalations have highlighted issues caused by outdated or incorrect policy values being applied. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22ipn/ipnlocal: update ipn.Prefs when there's a change in syspolicy settingsNick Khyl2-26/+199
In this PR, we update ipnlocal.NewLocalBackend to subscribe to policy change notifications and reapply syspolicy settings to the current profile's ipn.Prefs whenever a change occurs. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22ipn/ipnlocal: move syspolicy handling from setExitNodeID to applySysPolicyNick Khyl2-45/+56
This moves code that handles ExitNodeID/ExitNodeIP syspolicy settings from (*LocalBackend).setExitNodeID to applySysPolicy. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22cmd/tailscaled: log SCM interactions if the policy setting is enabled at the ↵Nick Khyl1-5/+4
time of interaction This updates the syspolicy.LogSCMInteractions check to run at the time of an interaction, just before logging a message, instead of during service startup. This ensures the most recent policy setting is used if it has changed since the service started. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22cmd/tailscaled: flush DNS if FlushDNSOnSessionUnlock is true upon receiving ↵Nick Khyl1-11/+10
a session change notification In this PR, we move the syspolicy.FlushDNSOnSessionUnlock check from service startup to when a session change notification is received. This ensures that the most recent policy setting value is used if it has changed since the service started. We also plan to handle session change notifications for unrelated reasons and need to decouple notification subscriptions from DNS anyway. Updates #12687 Updates tailscale/corp#18342 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22util/syspolicy/rsop: reduce policyReloadMinDelay and policyReloadMaxDelay ↵Nick Khyl3-9/+15
when in tests These delays determine how soon syspolicy change callbacks are invoked after a policy setting is updated in a policy source. For tests, we shorten these delays to minimize unnecessary wait times. This adjustment only affects tests that subscribe to policy change notifications and modify policy settings after they have already been set. Initial policy settings are always available immediately without delay. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22ipn/{ipnlocal,localapi}, wgengine/netstack: call (*LocalBackend).Shutdown ↵Nick Khyl4-0/+8
when tests that create them complete We have several places where LocalBackend instances are created for testing, but they are rarely shut down when the tests that created them exit. In this PR, we update newTestLocalBackend and similar functions to use testing.TB.Cleanup(lb.Shutdown) to ensure LocalBackend instances are properly shut down during test cleanup. Updates #12687 Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-11-22cmd/{containerboot,k8s-operator},k8s-operator: new options to expose user ↵Tom Proctor14-34/+472
metrics (#14035) containerboot: Adds 3 new environment variables for containerboot, `TS_LOCAL_ADDR_PORT` (default `"${POD_IP}:9002"`), `TS_METRICS_ENABLED` (default `false`), and `TS_DEBUG_ADDR_PORT` (default `""`), to configure metrics and debug endpoints. In a follow-up PR, the health check endpoint will be updated to use the `TS_LOCAL_ADDR_PORT` if `TS_HEALTHCHECK_ADDR_PORT` hasn't been set. Users previously only had access to internal debug metrics (which are unstable and not recommended) via passing the `--debug` flag to tailscaled, but can now set `TS_METRICS_ENABLED=true` to expose the stable metrics documented at https://tailscale.com/kb/1482/client-metrics at `/metrics` on the addr/port specified by `TS_LOCAL_ADDR_PORT`. Users can also now configure a debug endpoint more directly via the `TS_DEBUG_ADDR_PORT` environment variable. This is not recommended for production use, but exposes an internal set of debug metrics and pprof endpoints. operator: The `ProxyClass` CRD's `.spec.metrics.enable` field now enables serving the stable user metrics documented at https://tailscale.com/kb/1482/client-metrics at `/metrics` on the same "metrics" container port that debug metrics were previously served on. To smooth the transition for anyone relying on the way the operator previously consumed this field, we also _temporarily_ serve tailscaled's internal debug metrics on the same `/debug/metrics` path as before, until 1.82.0 when debug metrics will be turned off by default even if `.spec.metrics.enable` is set. At that point, anyone who wishes to continue using the internal debug metrics (not recommended) will need to set the new `ProxyClass` field `.spec.statefulSet.pod.tailscaleContainer.debug.enable`. Users who wish to opt out of the transitional behaviour, where enabling `.spec.metrics.enable` also enables debug metrics, can set `.spec.statefulSet.pod.tailscaleContainer.debug.enable` to false (recommended). Separately but related, the operator will no longer specify a host port for the "metrics" container port definition. This caused scheduling conflicts when k8s needs to schedule more than one proxy per node, and was not necessary for allowing the pod's port to be exposed to prometheus scrapers. Updates #11292 --------- Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com> Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-11-22cmd/k8s-operator/deploy: ensure that operator can write kube state Events ↵Irbe Krumina2-0/+16
(#14177) A small follow-up to #14112- ensures that the operator itself can emit Events for its kube state store changes. Updates tailscale/tailscale#14080 Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-21cli: present risk warning when setting up app connector on macOS (#14181)Andrea Gottardo3-3/+23
2024-11-21net/tsaddr: include test input in test failure outputBrad Fitzpatrick1-2/+2
https://go.dev/wiki/CodeReviewComments#useful-test-failures (Previously it was using subtests with names including the input, but once those went away, there was no context left) Updates #14169 Change-Id: Ib217028183a3d001fe4aee58f2edb746b7b3aa88 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-11-20cmd/tailscale/cli: create netmon in debug ts2021Andrew Dunham2-0/+9
Otherwise we'll see a panic if we hit the dnsfallback code and try to call NewDialer with a nil NetMon. Updates #14161 Signed-off-by: Andrew Dunham <andrew@du.nham.ca> Change-Id: I81c6e72376599b341cb58c37134c2a948b97cf5f
2024-11-20util/fastuuid: delete unused packageBrad Fitzpatrick2-128/+0
Its sole user was deleted in 02cafbe1cadfc. And it has no public users: https://pkg.go.dev/tailscale.com/util/fastuuid?tab=importedby And nothing in other Tailsale repos that I can find. Updates tailscale/corp#24721 Change-Id: I8755770a255a91c6c99f596e6d10c303b3ddf213 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-11-20tsweb: change RequestID format to have a date in itBrad Fitzpatrick5-13/+35
So we can locate them in logs more easily. Updates tailscale/corp#24721 Change-Id: Ia766c75608050dde7edc99835979a6e9bb328df2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-11-20net/tsaddr: extract IsTailscaleIPv4 from IsTailscaleIP (#14169)James Scott2-2/+76
Extracts tsaddr.IsTailscaleIPv4 out of tsaddr.IsTailscaleIP. This will allow for checking valid Tailscale assigned IPv4 addresses without checking IPv6 addresses. Updates #14168 Updates tailscale/corp#24620 Signed-off-by: James Scott <jim@tailscale.com>