summaryrefslogtreecommitdiffhomepage
path: root/wgengine
AgeCommit message (Collapse)AuthorFilesLines
2021-05-16all: adapt to opaque netaddr typesJosh Bleecher Snyder20-156/+148
This commit is a mishmash of automated edits using gofmt: gofmt -r 'netaddr.IPPort{IP: a, Port: b} -> netaddr.IPPortFrom(a, b)' -w . gofmt -r 'netaddr.IPPrefix{IP: a, Port: b} -> netaddr.IPPrefixFrom(a, b)' -w . gofmt -r 'a.IP.Is4 -> a.IP().Is4' -w . gofmt -r 'a.IP.As16 -> a.IP().As16' -w . gofmt -r 'a.IP.Is6 -> a.IP().Is6' -w . gofmt -r 'a.IP.As4 -> a.IP().As4' -w . gofmt -r 'a.IP.String -> a.IP().String' -w . And regexps: \w*(.*)\.Port = (.*) -> $1 = $1.WithPort($2) \w*(.*)\.IP = (.*) -> $1 = $1.WithIP($2) And lots of manual fixups. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-14tsnet: add Tailscale-as-a-library packageBrad Fitzpatrick1-2/+12
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-11wgengine: remove wireguard-go DeviceOptionsJosh Bleecher Snyder3-16/+4
We no longer need them. This also removes the 32 bytes of prefix junk before endpoints. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11all: add extra information to serialized endpointsJosh Bleecher Snyder13-183/+239
magicsock.Conn.ParseEndpoint requires a peer's public key, disco key, and legacy ip/ports in order to do its job. We currently accomplish that by: * adding the public key in our wireguard-go fork * encoding the disco key as magic hostname * using a bespoke comma-separated encoding It's a bit messy. Instead, switch to something simpler: use a json-encoded struct containing exactly the information we need, in the form we use it. Our wireguard-go fork still adds the public key to the address when it passes it to ParseEndpoint, but now the code compensating for that is just a couple of simple, well-commented lines. Once this commit is in, we can remove that part of the fork and remove the compensating code. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-11wgengine/wglog: optimize wireguardGoStringJosh Bleecher Snyder1-7/+14
The new code is ugly, but much faster and leaner. name old time/op new time/op delta SetPeers-8 7.81µs ± 1% 3.59µs ± 1% -54.04% (p=0.000 n=9+10) name old alloc/op new alloc/op delta SetPeers-8 7.68kB ± 0% 2.53kB ± 0% -67.08% (p=0.000 n=10+10) name old allocs/op new allocs/op delta SetPeers-8 237 ± 0% 99 ± 0% -58.23% (p=0.000 n=10+10) Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11wgengine/wglog: add BenchmarkSetPeerJosh Bleecher Snyder1-0/+28
Because it showed up on hello profiles. Cycle through some moderate-sized sets of peers. This should cover the "small tweaks to netmap" and the "up/down cycle" cases. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11internal/deephash: rename from deepprintBrad Fitzpatrick1-4/+4
Yes, it printed, but that was an implementation detail for hashing. And coming optimization will make it print even less. Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-11Revert "wgengine/bench: skip flaky test"Josh Bleecher Snyder1-1/+0
This reverts commit d707e2f7e524a994ce38615d74f1793784705232. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11wgengine/bench: ignore "engine closing" errorsJosh Bleecher Snyder2-1/+10
On benchmark completion, we shut down the wgengine. If we happen to poll for status during shutdown, we get an "engine closing" error. It doesn't hurt anything; ignore it. Fixes tailscale/corp#1776 Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-11wgengine/bench: skip flaky testBrad Fitzpatrick1-0/+1
Updates tailscale/corp#1776 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-10wgengine/bench: hold lock in TrafficGen.GotPacket while calling first packet ↵Josh Bleecher Snyder1-3/+1
callback Without any synchronization here, the "first packet" callback can be delayed indefinitely, while other work continues. Since the callback starts the benchmark timer, this could skew results. Worse, if the benchmark manages to complete before the benchmark timer begins, it'll cause a data race with the benchmark shutdown performed by package testing. That is what is reported in #1881. This is a bit unfortunate, in that it means that users of TrafficGen have to be careful to keep this callback speedy and lightweight and to avoid deadlocks. Fixes #1881 Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-10wgengine/bench: handle multiple Engine status callbacksJosh Bleecher Snyder1-2/+4
It is possible to get multiple status callbacks from an Engine. We need to wait for at least one from each Engine. Without limiting to one per Engine, wait.Wait can exit early or can panic due to a negative counter. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-10wgengine/bench: close Engines on benchmark completionJosh Bleecher Snyder3-3/+10
This reduces the speed with which these benchmarks exhaust their supply fds. Not to zero unfortunately, but it's still helpful when doing long runs. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/magicsock: rename discoEndpoint.wgEndpointHostPort to wgEndpointJosh Bleecher Snyder1-14/+14
Fields rename only. Part of the general effort to make our code agnostic about endpoint formatting. It's just a name, but it will soon be a misleading one; be more generic. Do this as a separate commit because it generates a lot of whitespace changes. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/magicsock: use netaddr.MustParseIPPrefixJosh Bleecher Snyder1-10/+1
Delete our bespoke helper. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06all: s/CreateEndpoint/ParseEndpoint/ in docsJosh Bleecher Snyder2-7/+7
Upstream wireguard-go renamed the interface method from CreateEndpoint to ParseEndpoint. I missed some comments. Fix them. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/wgcfg: make device test endpoint-format-agnosticJosh Bleecher Snyder1-2/+26
By using conn.NewDefaultBind, this test requires that our endpoints be comprehensible to wireguard-go. Instead, use a no-op bind that treats endpoints as opaque strings. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/wgcfg: use autogenerated Clone methodsJosh Bleecher Snyder3-29/+64
Delete the manually written ones named Copy. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/magicsock: simplify legacy endpoint DstToStringJosh Bleecher Snyder1-11/+5
Legacy endpoints (addrSet) currently reconstruct their dst string when requested. Instead, store the dst string we were given to begin with. In addition to being simpler and cheaper, this makes less code aware of how to interpret endpoint strings. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/wgcfg: return better errors from DeviceConfig, ReconfigDeviceJosh Bleecher Snyder1-8/+10
Prefer the error from the actual wireguard-go device method call, not {To,From}UAPI, as those tend to be less interesting I/O errors. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/wgcfg: prevent ReconfigDevice from hanging on errorJosh Bleecher Snyder1-1/+2
When wireguard-go's UAPI interface fails with an error, ReconfigDevice hangs. Fix that by buffering the channel and closing the writer after the call. The code now matches the corresponding code in DeviceConfig, where I got it right. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/userspace: delete HandshakeDoneJosh Bleecher Snyder1-186/+1
It is unused, and has been since early Feb 2021 (Tailscale 1.6). We can't get delete the DeviceOptions entirely yet; first #1831 and #1839 need to go in, along with some wireguard-go changes. Deleting this chunk of code now will make the later commits more clearly correct. Pingers can now go too. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-06wgengine/netstack: avoid delivering incoming packets to both netstack + hostBrad Fitzpatrick1-1/+8
The earlier eb06ec172f1d984bb87c589da1dd2d3f15dc6d82 fixed the flaky SSH issue (tailscale/corp#1725) by making sure that packets addressed to Tailscale IPs in hybrid netstack mode weren't delivered to netstack, but another issue remained: All traffic handled by netstack was also potentially being handled by the host networking stack, as the filter hook returned "Accept", which made it keep processing. This could lead to various random racey chaos as a function of OS/firewalls/routes/etc. Instead, once we inject into netstack, stop our caller's packet processing. Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-05wgengine: fix pendopen debug to not track SYN+ACKs, show Node.Online stateBrad Fitzpatrick1-4/+23
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-05wgengine/netstack: don't pass non-subnet traffic to netstack in hybrid modeBrad Fitzpatrick1-1/+22
Fixes tailscale/corp#1725 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-05net/tsaddr: add NewContainsIPFunc (move from wgengine)Brad Fitzpatrick1-24/+2
I want to use this from netstack but it's not exported. Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-04wgengine/router: use net.IP.Equal instead of bytes.Equal to compare IPsJosh Bleecher Snyder1-2/+2
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-04wgengine/router: remove unused fieldJosh Bleecher Snyder1-9/+0
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-04all: use lower-case letters at the start of error messageJosh Bleecher Snyder1-1/+1
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-05-03wgenengine/magicsock: delete cursed testsJosh Bleecher Snyder1-152/+0
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-30wgengine/wglog: improve wireguard-go logging rate limitingJosh Bleecher Snyder2-34/+54
Prior to wireguard-go using printf-style logging, all wireguard-go logging occurred using format string "%s". We fixed that but continued to use %s when we rewrote peer identifiers into Tailscale style. This commit removes that %sl, which makes rate limiting work correctly. As a happy side-benefit, it should generate less garbage. Instead of replacing all wireguard-go peer identifiers that might occur anywhere in a fully formatted log string, assume that they only come from args. Check all args for things that look like *device.Peers and replace them with appropriately reformatted strings. There is a variety of ways that this could go wrong (unusual format verbs or modifiers, peer identifiers occurring as part of a larger printed object, future API changes), but none of them occur now, are likely to be added, or would be hard to work around if they did. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-30wgengine/wglog: delay formattingJosh Bleecher Snyder1-5/+4
The "stop phrases" we use all occur in wireguard-go in the format string. We can avoid doing a bunch of fmt.Sprintf work when they appear. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29net/dns: add GOOS build tagsJosh Bleecher Snyder1-0/+2
Fixes #1786 Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29all: delete wgcfg.Key and wgcfg.PrivateKeyJosh Bleecher Snyder12-374/+28
For historical reasons, we ended up with two near-duplicate copies of curve25519 key types, one in the wireguard-go module (wgcfg) and one in the tailscale module (types/wgkey). Then we moved wgcfg to the tailscale module. We can now remove the wgcfg key type in favor of wgkey. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28wgengine/magicsock: always run ReceiveIPv6Josh Bleecher Snyder1-7/+4
One of the consequences of the bind refactoring in 6f23087175 is that attempting to bind an IPv6 socket will always result in c.pconn6.pconn being non-nil. If the bind fails, it'll be set to a placeholder packet conn that blocks forever. As a result, we can always run ReceiveIPv6 and health check it. This removes IPv4/IPv6 asymmetry and also will allow health checks to detect any IPv6 receive func failures. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28health: track whether we have a functional udp4 bindJosh Bleecher Snyder1-0/+6
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28wgengine/magicsock: use netaddr.IP in listenPacketJosh Bleecher Snyder1-7/+18
It must be an IP address; enforce that at the type level. Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28wgengine/magicsock: unify initial bind and rebindJosh Bleecher Snyder1-59/+122
We had two separate code paths for the initial UDP listener bind and any subsequent rebinds. IPv6 got left out of the rebind code. Rather than duplicate it there, unify the two code paths. Then improve the resulting code: * Rebind had nested listen attempts to try the user-specified port first, and then fall back to :0 if that failed. Convert that into a loop. * Initial bind tried only the user-specified port. Rebind tried the user-specified port and 0. But there are actually three ports of interest: The one the user specified, the most recent port in use, and 0. We now try all three in order, as appropriate. * In the extremely rare case in which binding to port 0 fails, use a dummy net.PacketConn whose reads block until close. This will keep the wireguard-go receive func goroutine alive. As a pleasant side-effect of this, if we decide that we need to resuscitate #1796, it will now be much easier. Fixes #1799 Co-authored-by: David Anderson <danderson@tailscale.com> Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28wgengine/magicsock: remove DefaultPort constJosh Bleecher Snyder1-15/+1
Assume it'll stay at 0 forever, so hard-code it and delete code conditional on it being non-0. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28wgengine/magicsock: remove context arg from listenPacketJosh Bleecher Snyder1-8/+8
It was set to context.Background by all callers, for the same reasons. Set it locally instead, to simplify call sites. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-27wgengine: periodically poll engine status for logging side effectBrad Fitzpatrick1-0/+17
Fixes tailscale/corp#1560 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27wgengine: update a log line from 'weird' to conventional 'unexpected'Brad Fitzpatrick1-1/+1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26health, wgenegine: fix receive func health checks for the fourth timeJosh Bleecher Snyder1-0/+4
The old implementation knew too much about how wireguard-go worked. As a result, it missed genuine problems that occurred due to unrelated bugs. This fourth attempt to fix the health checks takes a black box approach. A receive func is healthy if one (or both) of these conditions holds: * It is currently running and blocked. * It has been executed recently. The second condition is required because receive functions are not continuously executing. wireguard-go calls them and then processes their results before calling them again. There is a theoretical false positive if wireguard-go go takes longer than one minute to process the results of a receive func execution. If that happens, we have other problems. Updates #1790 Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26health: delete ReceiveFunc health checksJosh Bleecher Snyder1-25/+0
They were not doing their job. They need yet another conceptual re-think. Start by clearing the decks. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26net/tstun: split TUN events channel into up/down and MTUJosh Bleecher Snyder1-5/+1
We had a long-standing bug in which our TUN events channel was being received from simultaneously in two places. The first is wireguard-go. At wgengine/userspace.go:366, we pass e.tundev to wireguard-go, which starts a goroutine (RoutineTUNEventReader) that receives from that channel and uses events to adjust the MTU and bring the device up/down. At wgengine/userspace.go:374, we launch a goroutine that receives from e.tundev, logs MTU changes, and triggers state updates when up/down changes occur. Events were getting delivered haphazardly between the two of them. We don't really want wireguard-go to receive the up/down events; we control the state of the device explicitly by calling device.Up. And the userspace.go loop MTU logging duplicates logging that wireguard-go does when it received MTU updates. So this change splits the single TUN events channel into up/down and other (aka MTU), and sends them to the parties that ought to receive them. I'm actually a bit surprised that this hasn't caused more visible trouble. If a down event went to wireguard-go but the subsequent up event went to userspace.go, we could end up with the wireguard-go device disappearing. I believe that this may also (somewhat accidentally) be a fix for #1790. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26wgengine/bench: improved rate selection.Avery Pennarun3-17/+31
The old decay-based one took a while to converge. This new one (based very loosely on TCP BBR) seems to converge quickly on what seems to be the best speed. Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26wgengine/bench: speed test for channels, sockets, and wireguard-go.Avery Pennarun4-0/+959
This tries to generate traffic at a rate that will saturate the receiver, without overdoing it, even in the event of packet loss. It's unrealistically more aggressive than TCP (which will back off quickly in case of packet loss) but less silly than a blind test that just generates packets as fast as it can (which can cause all the CPU to be absorbed by the transmitter, giving an incorrect impression of how much capacity the total system has). Initial indications are that a syscall about every 10 packets (TCP bulk delivery) is roughly the same speed as sending every packet through a channel. A syscall per packet is about 5x-10x slower than that. The whole tailscale wireguard-go + magicsock + packet filter combination is about 4x slower again, which is better than I thought we'd do, but probably has room for improvement. Note that in "full" tailscale, there is also a tundev read/write for every packet, effectively doubling the syscall overhead per packet. Given these numbers, it seems like read/write syscalls are only 25-40% of the total CPU time used in tailscale proper, so we do have significant non-syscall optimization work to do too. Sample output: $ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench goos: linux goarch: amd64 pkg: tailscale.com/cmd/tailbench cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz BenchmarkTrivialNoAlloc/32-2 56340248 93.85 ns/op 340.98 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTrivialNoAlloc/124-2 57527490 99.27 ns/op 1249.10 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTrivialNoAlloc/1024-2 52537773 111.3 ns/op 9200.39 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTrivial/32-2 41878063 135.6 ns/op 236.04 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTrivial/124-2 41270439 138.4 ns/op 896.02 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTrivial/1024-2 36337252 154.3 ns/op 6635.30 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkBlockingChannel/32-2 12171654 494.3 ns/op 64.74 MB/s 0 %lost 1791 B/op 0 allocs/op BenchmarkBlockingChannel/124-2 12149956 507.8 ns/op 244.17 MB/s 0 %lost 1792 B/op 1 allocs/op BenchmarkBlockingChannel/1024-2 11034754 528.8 ns/op 1936.42 MB/s 0 %lost 1792 B/op 1 allocs/op BenchmarkNonlockingChannel/32-2 8960622 2195 ns/op 14.58 MB/s 8.825 %lost 1792 B/op 1 allocs/op BenchmarkNonlockingChannel/124-2 3014614 2224 ns/op 55.75 MB/s 11.18 %lost 1792 B/op 1 allocs/op BenchmarkNonlockingChannel/1024-2 3234915 1688 ns/op 606.53 MB/s 3.765 %lost 1792 B/op 1 allocs/op BenchmarkDoubleChannel/32-2 8457559 764.1 ns/op 41.88 MB/s 5.945 %lost 1792 B/op 1 allocs/op BenchmarkDoubleChannel/124-2 5497726 1030 ns/op 120.38 MB/s 12.14 %lost 1792 B/op 1 allocs/op BenchmarkDoubleChannel/1024-2 7985656 1360 ns/op 752.86 MB/s 13.57 %lost 1792 B/op 1 allocs/op BenchmarkUDP/32-2 1652134 3695 ns/op 8.66 MB/s 0 %lost 176 B/op 3 allocs/op BenchmarkUDP/124-2 1621024 3765 ns/op 32.94 MB/s 0 %lost 176 B/op 3 allocs/op BenchmarkUDP/1024-2 1553750 3825 ns/op 267.72 MB/s 0 %lost 176 B/op 3 allocs/op BenchmarkTCP/32-2 11056336 503.2 ns/op 63.60 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTCP/124-2 11074869 533.7 ns/op 232.32 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkTCP/1024-2 8934968 671.4 ns/op 1525.20 MB/s 0 %lost 0 B/op 0 allocs/op BenchmarkWireGuardTest/32-2 1403702 4547 ns/op 7.04 MB/s 14.37 %lost 467 B/op 3 allocs/op BenchmarkWireGuardTest/124-2 780645 7927 ns/op 15.64 MB/s 1.537 %lost 420 B/op 3 allocs/op BenchmarkWireGuardTest/1024-2 512671 11791 ns/op 86.85 MB/s 0.5206 %lost 411 B/op 3 allocs/op PASS ok tailscale.com/wgengine/bench 195.724s Updates #414. Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-24wgengine/router{win}: ignore broadcast routes added by Windows when removing ↵Maisem Ali4-57/+117
routes. Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-04-23health, wgenegine: fix receive func health checks yet againJosh Bleecher Snyder1-14/+25
The existing implementation was completely, embarrassingly conceptually broken. We aren't able to see whether wireguard-go's receive function goroutines are running or not. All we can do is model that based on what we have done. This commit fixes that model. Fixes #1781 Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22health, wgengine/magicsock: avoid receive function false positivesJosh Bleecher Snyder1-1/+7
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok". To avoid these transient false-positives, be more precise about the expected lifetime of receive funcs. The problematic case is one in which they were started but exited prior to a call to connBind.Close. Explicitly represent started vs running state, taking care with the order of updates. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>