summaryrefslogtreecommitdiffhomepage
path: root/logtail
AgeCommit message (Collapse)AuthorFilesLines
2026-04-20logtail: run HTTP tests in-memory with memnet + synctestBrad Fitzpatrick1-42/+47
TestEncodeAndUploadMessages waited on the default 2s FlushDelay, making the logtail package the slowest non-integration test in the tree (~2s real time). Switch the shared harness from an httptest.Server-on-loopback to a memnet.Listener-backed *http.Server and run the tests inside synctest.Test, so fake time advances the flush timer instantly. Drops the net/http/httptest dependency from these tests. Combined with the TestMain non-localhost dial guard added in the previous commit, no test in this package can accidentally reach the real log.tailscale.com server. Whole package now runs in ~7ms. Updates tailscale/corp#28679 Change-Id: Ie0e7a6a79641384ed0eecb99d767e17cda8bb944 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-20logtail: add Config.Disabled to suppress the startup bannerBrad Fitzpatrick3-0/+84
NewLogger unconditionally writes a "logtail started" banner before it returns, which callers that later call Logger.SetEnabled(false) have no way to suppress: the banner is already buffered for upload by the time the caller gets the logger back. Add Config.Disabled so callers that know up front they want the logger to start disabled (e.g. Android's remote-logging opt-out) can seed the state before NewLogger's internal Write. The process- wide Disable kill switch still takes precedence; SetEnabled can still flip the state at runtime. Updates #13174 Updates tailscale/tailscale-android#695 Change-Id: Icc4fa88c198447cf0faa707264dac84e359fe52c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-17logtail: add Logger.SetEnabled to toggle uploads at runtimeBrad Fitzpatrick3-1/+48
Callers that need to turn logtail uploads on and off in response to user preference or policy changes previously had no choice: the package-level Disable is a one-way kill switch intended for the controlplane DisableLogTail debug message, and requires a process restart to undo. Add a per-Logger disabled flag, toggled via SetEnabled, that drops incoming entries without buffering while disabled. The process-wide Disable still takes precedence, so a controlplane-issued kill switch cannot be overridden by a client setting it back on. To simplify https://github.com/tailscale/tailscale-android/pull/695 Updates #13174 Change-Id: I06e75bd719c851f5f837ca5b2d1e17f7c68355f0 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-06all: use Go 1.26 things, run most gofix modernizersBrad Fitzpatrick2-5/+5
I omitted a lot of the min/max modernizers because they didn't result in more clear code. Some of it's older "for x := range 123". Also: errors.AsType, any, fmt.Appendf, etc. Updates #18682 Change-Id: I83a451577f33877f962766a5b65ce86f7696471c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-02-10logtail/filch: fix filch test panic (#18660)James Scott2-27/+14
Updates rotateLocked so that we hold the activeStderrWriteForTest write lock around the dup2Stderr call, rather than acquiring it only after dup2 was already compelete. This ensures no stderrWriteForTest calls can race with the dup2 syscall. The now unused waitIdleStderrForTest has been removed. On macOS, dup2 and write on the same file descriptor are not atomic with respect to each other, when rotateLocked called dup2Stderr to redirect the stderr fd to a new file, concurrent goroutines calling stderrWriteForTest could observe the fd in a transiently invalid state, resulting in the bad file descripter. Fixes tailscale/corp#36953 Signed-off-by: James Scott <jim@tailscale.com>
2026-02-09.github/workflows: add macos runnerBrad Fitzpatrick1-0/+14
Fixes #18118 Change-Id: I118fcc6537af9ccbdc7ce6b78134e8059b0b5ccf Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-01-30logtail/filch: close Filch instances in TestConcurrentSameFile (#18571)Fernando Serboncini1-0/+2
On Windows, TempDir cleanup fails if file handles are still open. TestConcurrentSameFile wasn't closing Filch instances before exit Fixes #18570 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-01-28logtail/filch: fix panic in concurrent file access (#18555)Joe Tsai2-2/+25
In the event of multiple Filch intances being backed by the same file, it is possible that concurrent rotateLocked calls occur. One operation might clear the file, resulting in another skipping the call to resetReadBuffer, resulting in a later panic because the read index is invalid. To at least avoid the panic, always call resetReadBuffer. Note that the behavior of Filch is undefined when using the same file. While this avoids the panic, we may still experience data corruption or less. Fixes #18552 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2026-01-23all: remove AUTHORS file and references to itWill Norris15-15/+15
This file was never truly necessary and has never actually been used in the history of Tailscale's open source releases. A Brief History of AUTHORS files --- The AUTHORS file was a pattern developed at Google, originally for Chromium, then adopted by Go and a bunch of other projects. The problem was that Chromium originally had a copyright line only recognizing Google as the copyright holder. Because Google (and most open source projects) do not require copyright assignemnt for contributions, each contributor maintains their copyright. Some large corporate contributors then tried to add their own name to the copyright line in the LICENSE file or in file headers. This quickly becomes unwieldy, and puts a tremendous burden on anyone building on top of Chromium, since the license requires that they keep all copyright lines intact. The compromise was to create an AUTHORS file that would list all of the copyright holders. The LICENSE file and source file headers would then include that list by reference, listing the copyright holder as "The Chromium Authors". This also become cumbersome to simply keep the file up to date with a high rate of new contributors. Plus it's not always obvious who the copyright holder is. Sometimes it is the individual making the contribution, but many times it may be their employer. There is no way for the proejct maintainer to know. Eventually, Google changed their policy to no longer recommend trying to keep the AUTHORS file up to date proactively, and instead to only add to it when requested: https://opensource.google/docs/releasing/authors. They are also clear that: > Adding contributors to the AUTHORS file is entirely within the > project's discretion and has no implications for copyright ownership. It was primarily added to appease a small number of large contributors that insisted that they be recognized as copyright holders (which was entirely their right to do). But it's not truly necessary, and not even the most accurate way of identifying contributors and/or copyright holders. In practice, we've never added anyone to our AUTHORS file. It only lists Tailscale, so it's not really serving any purpose. It also causes confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header in other open source repos which don't actually have an AUTHORS file, so it's ambiguous what that means. Instead, we just acknowledge that the contributors to Tailscale (whoever they are) are copyright holders for their individual contributions. We also have the benefit of using the DCO (developercertificate.org) which provides some additional certification of their right to make the contribution. The source file changes were purely mechanical with: git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g' Updates #cleanup Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d Signed-off-by: Will Norris <will@tailscale.com>
2025-12-17net/netmon, wgengine/userspace: purge ChangeDelta.Major and address TODOs ↵Jonathan Nobels1-2/+2
(#17823) updates tailscale/corp#33891 Addresses several older the TODO's in netmon. This removes the Major flag precomputes the ChangeDelta state, rather than making consumers of ChangeDeltas sort that out themselves. We're also seeing a lot of ChangeDelta's being flagged as "Major" when they are not interesting, triggering rebinds in wgengine that are not needed. This cleans that up and adds a host of additional tests. The dependencies are cleaned, notably removing dependency on netmon itself for calculating what is interesting, and what is not. This includes letting individual platforms set a bespoke global "IsInterestingInterface" function. This is only used on Darwin. RebindRequired now roughly follows how "Major" was historically calculated but includes some additional checks for various uninteresting events such as changes in interface addresses that shouldn't trigger a rebind. This significantly reduces thrashing (by roughly half on Darwin clients which switching between nics). The individual values that we roll into RebindRequired are also exposed so that components consuming netmap.ChangeDelta can ask more targeted questions. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2025-12-11logtail: add metrics (#18184)Joe Tsai3-3/+85
Add metrics about logtail uploading and underlying buffer. Add metrics to the in-memory buffer implementation. Updates tailscale/corp#21363 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2025-12-10logtail/filch: rewrite the package (#18143)Joe Tsai6-304/+732
The filch implementation is fairly broken: * When Filch.cur exceeds MaxFileSize, it calls moveContents to copy the entirety of cur into alt (while holding the write lock). By nature, this is the movement of a lot of data in a hot path, meaning that all log calls will be globally blocked! It also means that log uploads will be blocked during the move. * The implementation of moveContents is buggy in that it copies data from cur into the start of alt, but fails to truncate alt to the number of bytes copied. Consequently, there are unrelated lines near the end, leading to out-of-order lines when being read back. * Data filched via stderr do not directly respect MaxFileSize, which is only checked every 100 Filch.Write calls. This means that it is possible that the file grows far beyond the specified max file size before moveContents is called. * If both log files have data when New is called, it also copies the entirety of cur into alt. This can block the startup of a process copying lots of data before the process can do any useful work. * TryReadLine is implemented using bufio.Scanner. Unfortunately, it will choke on any lines longer than bufio.MaxScanTokenSize, rather than gracefully skip over them. The re-implementation avoids a lot of these problems by fundamentally eliminating the need for moveContent. We enforce MaxFileSize by simply rotating the log files whenever the current file exceeds MaxFileSize/2. This is a constant-time operation regardless of file size. To more gracefully handle lines longer than bufio.MaxScanTokenSize, we skip over these lines (without growing the read buffer) and report an error. This allows subsequent lines to be read. In order to improve debugging, we add a lot of metrics. Note that the the mechanism of dup2 with stderr is inherently racy with a the two file approach. The order of operations during a rotation is carefully chosen to reduce the race window to be as short as possible. Thus, this is slightly less racy than before. Updates tailscale/corp#21363 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2025-11-18all: rename variables with lowercase-l/uppercase-IAlex Chan2-161/+161
See http://go/no-ell Signed-off-by: Alex Chan <alexc@tailscale.com> Updates #cleanup Change-Id: I8c976b51ce7a60f06315048b1920516129cc1d5d
2025-11-16syncs: add Mutex/RWMutex alias/wrappers for future mutex debuggingBrad Fitzpatrick1-2/+3
Updates #17852 Change-Id: I477340fb8e40686870e981ade11cd61597c34a20 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-30logtail: avoid racing eventbus subscriptions with shutdown (#17695)M. J. Fromberger1-22/+28
In #17639 we moved the subscription into NewLogger to ensure we would not race subscribing with shutdown of the eventbus client. Doing so fixed that problem, but exposed another: As we were only servicing events occasionally when waiting for the network to come up, we could leave the eventbus to stall in cases where a number of network deltas arrived later and weren't processed. To address that, let's separate the concerns: As before, we'll Subscribe early to avoid conflicts with shutdown; but instead of using the subscriber directly to determine readiness, we'll keep track of the last-known network state in a selectable condition that the subscriber updates for us. When we want to wait, we'll wait on that condition (or until our context ends), ensuring all the events get processed in a timely manner. Updates #17638 Updates #15160 Change-Id: I28339a372be4ab24be46e2834a218874c33a0d2d Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2025-10-28Revert "logtail: avoid racing eventbus subscriptions with Shutdown (#17639)" ↵M. J. Fromberger1-19/+20
(#17684) This reverts commit 4346615d77a6de16854c6e78f9d49375d6424e6e. We averted the shutdown race, but will need to service the subscriber even when we are not waiting for a change so that we do not delay the bus as a whole. Updates #17638 Change-Id: I5488466ed83f5ad1141c95267f5ae54878a24657 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2025-10-24logtail: avoid racing eventbus subscriptions with Shutdown (#17639)M. J. Fromberger1-20/+19
When the eventbus is enabled, set up the subscription for change deltas at the beginning when the client is created, rather than waiting for the first awaitInternetUp check. Otherwise, it is possible for a check to race with the client close in Shutdown, which triggers a panic. Updates #17638 Change-Id: I461c07939eca46699072b14b1814ecf28eec750c Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2025-10-01net/netmon: remove usage of direct callbacks from netmon (#17292)Claus Lensbøl3-1/+39
The callback itself is not removed as it is used in other repos, making it simpler for those to slowly transition to the eventbus. Updates #15160 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-09-29feature/logtail: pull logtail + netlog out to modular featuresBrad Fitzpatrick4-52/+113
Removes 434 KB from the minimal Linux binary, or ~3%. Primarily this comes from not linking in the zstd encoding code. Fixes #17323 Change-Id: I0a90de307dfa1ad7422db7aa8b1b46c782bfaaf7 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-28util/backoff: rename logtail/backoff package to util/backoffBrad Fitzpatrick1-80/+0
It has nothing to do with logtail and is confusing named like that. Updates #cleanup Updates #17323 Change-Id: Idd34587ba186a2416725f72ffc4c5778b0b9db4a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-28logtail: delete AppendTextOrJSONLockedJoe Tsai1-5/+0
This was accidentally added in #11671 for testing. Nothing uses it. Updates tailscale/corp#21363 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2025-05-02logtail: remove unneeded IP redaction codeBrad Fitzpatrick2-123/+0
Updates tailscale/corp#15664 Change-Id: I9523a43860685048548890cf1931ee6cbd60452c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-02-04logpolicy: expose MaxBufferSize and MaxUploadSize options (#14903)Joe Tsai1-3/+8
Updates tailscale/corp#26342 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-12-16Switch logging service from log.tailscale.io to log.tailscale.com (#14398)Joe Tsai5-9/+9
Updates tailscale/corp#23617 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-11-27logtail: avoid bytes.Buffer allocation (#11858)Joe Tsai1-2/+10
Re-use a pre-allocated bytes.Buffer struct and shallow the copy the result of bytes.NewBuffer into it to avoid allocating the struct. Note that we're only reusing the bytes.Buffer struct itself and not the underling []byte temporarily stored within it. Updates #cleanup Updates tailscale/corp#18514 Updates golang/go#67004 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-07-12logtail: close idle HTTP connections on shutdownAnton Tolchanov1-0/+1
Fixes tailscale/corp#21609 Co-authored-by: Maisem Ali <maisem@tailscale.com> Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-07-10all: add test for package comments, fix, add comments as neededBrad Fitzpatrick1-0/+1
Updates #cleanup Change-Id: Ic4304e909d2131a95a38b26911f49e7b1729aaef Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-05all: use math/rand/v2 moreMaisem Ali2-3/+3
Updates #11058 Signed-off-by: Maisem Ali <maisem@tailscale.com>
2024-05-23logtail/backoff: update Backoff.BackOff docs (#12229)Jordan Whited1-3/+2
Update #cleanup Signed-off-by: Jordan Whited <jordan@tailscale.com>
2024-04-27net/netns, net/dns/resolver, etc: make netmon required in most placesBrad Fitzpatrick1-1/+1
The goal is to move more network state accessors to netmon.Monitor where they can be cheaper/cached. But first (this change and others) we need to make sure the one netmon.Monitor is plumbed everywhere. Some notable bits: * tsdial.NewDialer is added, taking a now-required netmon * because a tsdial.Dialer always has a netmon, anything taking both a Dialer and a NetMon is now redundant; take only the Dialer and get the NetMon from that if/when needed. * netmon.NewStatic is added, primarily for tests Updates tailscale/corp#10910 Updates tailscale/corp#18960 Updates #7967 Updates #3299 Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-04-16all: use Go 1.22 range-over-intBrad Fitzpatrick2-5/+5
Updates #11058 Change-Id: I35e7ef9b90e83cac04ca93fd964ad00ed5b48430 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-04-12logtail: optimize JSON processing (#11671)Joe Tsai2-277/+421
Changes made: * Avoid "encoding/json" for JSON processing, and instead use "github.com/go-json-experiment/json/jsontext". Use jsontext.Value.IsValid for validation, which is much faster. Use jsontext.AppendQuote instead of our own JSON escaping. * In drainPending, use a different maxLen depending on lowMem. In lowMem mode, it is better to perform multiple uploads than it is to construct a large body that OOMs the process. * In drainPending, if an error is encountered draining, construct an error message in the logtail JSON format rather than something that is invalid JSON. * In appendTextOrJSONLocked, use jsontext.Decoder to check whether the input is a valid JSON object. This is faster than the previous approach of unmarshaling into map[string]any and then re-marshaling that data structure. This is especially beneficial for network flow logging, which produces relatively large JSON objects. * In appendTextOrJSONLocked, enforce maxSize on the input. If too large, then we may end up in a situation where the logs can never be uploaded because it exceeds the maximum body size that the Tailscale logs service accepts. * Use "tailscale.com/util/truncate" to properly truncate a string on valid UTF-8 boundaries. * In general, remove unnecessary spaces in JSON output. Performance: name old time/op new time/op delta WriteText 776ns ± 2% 596ns ± 1% -23.24% (p=0.000 n=10+10) WriteJSON 110µs ± 0% 9µs ± 0% -91.77% (p=0.000 n=8+8) name old alloc/op new alloc/op delta WriteText 448B ± 0% 0B -100.00% (p=0.000 n=10+10) WriteJSON 37.9kB ± 0% 0.0kB ± 0% -99.87% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteText 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) WriteJSON 1.08k ± 0% 0.00k ± 0% -99.91% (p=0.000 n=10+10) For text payloads, this is 1.30x faster. For JSON payloads, this is 12.2x faster. Updates #cleanup Updates tailscale/corp#18514 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-04-08logtail: require Buffer.Write to not retain the provided slice (#11617)Joe Tsai1-3/+3
Buffer.Write has the exact same signature of io.Writer.Write. The latter requires that implementations to never retain the provided input buffer, which is an expectation that most users will have when they see a Write signature. The current behavior of Buffer.Write where it does retain the input buffer is a risky precedent to set. Switch the behavior to match io.Writer.Write. There are only two implementations of Buffer in existence: * logtail.memBuffer * filch.Filch The former can be fixed by cloning the input to Write. This will cause an extra allocation in every Write, but we can fix that will pooling on the caller side in a follow-up PR. The latter only passes the input to os.File.Write, which does respect the io.Writer.Write requirements. Updates #cleanup Updates tailscale/corp#18514 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-04-01logtail: delete unused code from old way to configure zstdBrad Fitzpatrick1-24/+3
Updates #cleanup Change-Id: I666ecf08ea67e461adf2a3f4daa9d1753b2dc1e4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-04-01logtail: always zstd compress with FastestCompression and LowMemory (#11583)Joe Tsai1-3/+1
This is based on empirical testing using actual logs data. FastestCompression only incurs a marginal <1% compression ratio hit for a 2.25x reduction in memory use for small payloads (which are common if log uploads happen at a decently high frequency). The memory savings for large payloads is much lower (less than 1.1x reduction). LowMemory only incurs a marginal <5% hit on performance for a 1.6-2.0x reduction in memory use for small or large payloads. The memory gains for both settings justifies the loss of benefits, which are arguably minimal. tailscale/corp#18514 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-03-29logtail: prevent js/wasm clients from picking TLS client certBrad Fitzpatrick1-0/+14
Corp details: https://github.com/tailscale/corp/issues/18177#issuecomment-2026598715 https://github.com/tailscale/corp/pull/18775#issuecomment-2027505036 Updates tailscale/corp#18177 Change-Id: I7c03a4884540b8519e0996088d085af77991f477 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-03-25logtail: move a scratch buffer to LoggerBrad Fitzpatrick1-5/+13
Rather than pass around a scratch buffer, put it on the Logger. This is a baby step towards removing the background uploading goroutine and starting it as needed. Updates tailscale/corp#18514 (insofar as it led me to look at this code) Change-Id: I6fd94581c28bde40fdb9fca788eb9590bcedae1b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-03-21all: use zstdframe where sensible (#11491)Joe Tsai1-3/+19
Use the zstdframe package where sensible instead of plumbing around our own zstd.Encoder just for stateless operations. This causes logtail to have a dependency on zstd, but that's arguably okay since zstd support is implicit to the protocol between a client and the logging service. Also, virtually every caller to logger.NewLogger was manually setting up a zstd.Encoder anyways, meaning that zstd was functionally always a dependency. Updates #cleanup Updates tailscale/corp#18514 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2023-12-15tailscale/logtail: redact public ipv6 and ipv4 ip addresses within ↵as26432-0/+123
tailscaled. (#10531) Updates #15664 Signed-off-by: Anishka Singh <anishkasingh66@gmail.com>
2023-11-08logtail: fix Logger.Write return resultBrad Fitzpatrick2-1/+30
io.Writer says you need to write completely on err=nil. (the result int should be the same as the input buffer length) We weren't doing that. We used to, but at some point the verbose filtering was modifying buf before the final return of len(buf). We've been getting lucky probably, that callers haven't looked at our results and turned us into a short write error. Updates #cleanup Updates tailscale/corp#15664 Change-Id: I01e765ba35b86b759819e38e0072eceb9d10d75c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2023-09-06adjust build tags for tamagoAndrea Barisani1-1/+1
Signed-off-by: Andrea Barisani <andrea@inversepath.com>
2023-08-30adjust build tags for tamagoAndrea Barisani1-1/+1
Signed-off-by: Andrea Barisani <andrea@inversepath.com>
2023-08-24all: adjust some build tags for plan9Brad Fitzpatrick2-1/+3
I'm not saying it works, but it compiles. Updates #5794 Change-Id: I2f3c99732e67fe57a05edb25b758d083417f083e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2023-08-23net/netmon: make ChangeFunc's signature take new ChangeDelta, not boolBrad Fitzpatrick1-3/+2
Updates #9040 Change-Id: Ia43752064a1a6ecefc8802b58d6eaa0b71cf1f84 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2023-07-21logtail: use tstime (#8607)Claire Wang3-33/+34
Updates #8587 Signed-off-by: Claire Wang <claire@tailscale.com>
2023-07-11logtail: fix race condition with sockstats label (#8578)Joe Tsai1-4/+9
Updates tailscale/corp#8427 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2023-06-11all: adjust some build tags for wasiBrad Fitzpatrick2-1/+1
A start. Updates #8320 Change-Id: I64057f977be51ba63ce635c56d67de7ecec415d1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2023-05-11logtail: be less aggressive about re-uploads (#8117)Joe Tsai1-26/+35
The retry logic was pathological in the following ways: * If we restarted the logging service, any pending uploads would be placed in a retry-loop where it depended on backoff.Backoff, which was too aggresive. It would retry failures within milliseconds, taking at least 10 retries to hit a delay of 1 second. * In the event where a logstream was rate limited, the aggressive retry logic would severely exacerbate the problem since each retry would also log an error message. It is by chance that the rate of log error spam does not happen to exceed the rate limit itself. We modify the retry logic in the following ways: * We now respect the "Retry-After" header sent by the logging service. * Lacking a "Retry-After" header, we retry after a hard-coded period of 30 to 60 seconds. This avoids the thundering-herd effect when all nodes try reconnecting to the logging service at the same time after a restart. * We do not treat a status 400 as having been uploaded. This is simply not the behavior of the logging service. Updates #tailscale/corp#11213 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2023-04-20all: move network monitoring from wgengine/monitor to net/netmonMihai Parparita1-8/+8
We're using it in more and more places, and it's not really specific to our use of Wireguard (and does more just link/interface monitoring). Also removes the separate interface we had for it in sockstats -- it's a small enough package (we already pull in all of its dependencies via other paths) that it's not worth the extra complexity. Updates #7621 Updates #7850 Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2023-04-12net/sockstats: pass in logger to sockstats.WithSockStatsMihai Parparita1-1/+1
Using log.Printf may end up being printed out to the console, which is not desirable. I noticed this when I was investigating some client logs with `sockstats: trace "NetcheckClient" was overwritten by another`. That turns to be harmless/expected (the netcheck client will fall back to the DERP client in some cases, which does its own sockstats trace). However, the log output could be visible to users if running the `tailscale netcheck` CLI command, which would be needlessly confusing. Updates tailscale/corp#9230 Signed-off-by: Mihai Parparita <mihai@tailscale.com>