<feed xmlns='http://www.w3.org/2005/Atom'>
<title>wireguard-linux/net/rds/ib_rdma.c, branch devel</title>
<subtitle>WireGuard for the Linux kernel</subtitle>
<id>http://git.waynecole.info/wireguard-linux/atom?h=devel</id>
<link rel='self' href='http://git.waynecole.info/wireguard-linux/atom?h=devel'/>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/'/>
<updated>2026-04-12T20:33:19Z</updated>
<entry>
<title>net/rds: Optimize rds_ib_laddr_check</title>
<updated>2026-04-12T20:33:19Z</updated>
<author>
<name>Håkon Bugge</name>
<email>haakon.bugge@oracle.com</email>
</author>
<published>2026-04-08T08:04:19Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=236f718ac885965fa886440b9898dfae185c9733'/>
<id>urn:sha1:236f718ac885965fa886440b9898dfae185c9733</id>
<content type='text'>
rds_ib_laddr_check() creates a CM_ID and attempts to bind the address
in question to it. This in order to qualify the allegedly local
address as a usable IB/RoCE address.

In the field, ExaWatcher runs rds-ping to all ports in the fabric from
all local ports. This using all active ToS'es. In a full rack system,
we have 14 cell servers and eight db servers. Typically, 6 ToS'es are
used. This implies 528 rds-ping invocations per ExaWatcher's "RDSinfo"
interval.

Adding to this, each rds-ping invocation creates eight sockets and
binds the local address to them:

socket(AF_RDS, SOCK_SEQPACKET, 0)       = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 4
bind(4, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 5
bind(5, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 6
bind(6, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 7
bind(7, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 8
bind(8, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 9
bind(9, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 10
bind(10, {sa_family=AF_INET, sin_port=htons(0),
	sin_addr=inet_addr("192.168.36.2")}, 16) = 0

So, at every interval ExaWatcher executes rds-ping's, 4224 CM_IDs are
allocated, considering this full-rack system. After the a CM_ID has
been allocated, rdma_bind_addr() is called, with the port number being
zero. This implies that the CMA will attempt to search for an un-used
ephemeral port. Simplified, the algorithm is to start at a random
position in the available port space, and then if needed, iterate
until an un-used port is found.

The book-keeping of used ports uses the idr system, which again uses
slab to allocate new struct idr_layer's. The size is 2092 bytes and
slab tries to reduce the wasted space. Hence, it chooses an order:3
allocation, for which 15 idr_layer structs will fit and only 1388
bytes are wasted per the 32KiB order:3 chunk.

Although this order:3 allocation seems like a good space/speed
trade-off, it does not resonate well with how it used by the CMA. The
combination of the randomized starting point in the port space (which
has close to zero spatial locality) and the close proximity in time of
the 4224 invocations of the rds-ping's, creates a memory hog for
order:3 allocations.

These costly allocations may need reclaims and/or compaction. At
worst, they may fail and produce a stack trace such as (from uek4):

[&lt;ffffffff811a72d5&gt;] __inc_zone_page_state+0x35/0x40
[&lt;ffffffff811c2e97&gt;] page_add_file_rmap+0x57/0x60
[&lt;ffffffffa37ca1df&gt;] remove_migration_pte+0x3f/0x3c0 [ksplice_6cn872bt_vmlinux_new]
[&lt;ffffffff811c3de8&gt;] rmap_walk+0xd8/0x340
[&lt;ffffffff811e8860&gt;] remove_migration_ptes+0x40/0x50
[&lt;ffffffff811ea83c&gt;] migrate_pages+0x3ec/0x890
[&lt;ffffffff811afa0d&gt;] compact_zone+0x32d/0x9a0
[&lt;ffffffff811b00ed&gt;] compact_zone_order+0x6d/0x90
[&lt;ffffffff811b03b2&gt;] try_to_compact_pages+0x102/0x270
[&lt;ffffffff81190e56&gt;] __alloc_pages_direct_compact+0x46/0x100
[&lt;ffffffff8119165b&gt;] __alloc_pages_nodemask+0x74b/0xaa0
[&lt;ffffffff811d8411&gt;] alloc_pages_current+0x91/0x110
[&lt;ffffffff811e3b0b&gt;] new_slab+0x38b/0x480
[&lt;ffffffffa41323c7&gt;] __slab_alloc+0x3b7/0x4a0 [ksplice_s0dk66a8_vmlinux_new]
[&lt;ffffffff811e42ab&gt;] kmem_cache_alloc+0x1fb/0x250
[&lt;ffffffff8131fdd6&gt;] idr_layer_alloc+0x36/0x90
[&lt;ffffffff8132029c&gt;] idr_get_empty_slot+0x28c/0x3d0
[&lt;ffffffff813204ad&gt;] idr_alloc+0x4d/0xf0
[&lt;ffffffffa051727d&gt;] cma_alloc_port+0x4d/0xa0 [rdma_cm]
[&lt;ffffffffa0517cbe&gt;] rdma_bind_addr+0x2ae/0x5b0 [rdma_cm]
[&lt;ffffffffa09d8083&gt;] rds_ib_laddr_check+0x83/0x2c0 [ksplice_6l2xst5i_rds_rdma_new]
[&lt;ffffffffa05f892b&gt;] rds_trans_get_preferred+0x5b/0xa0 [rds]
[&lt;ffffffffa05f09f2&gt;] rds_bind+0x212/0x280 [rds]
[&lt;ffffffff815b4016&gt;] SYSC_bind+0xe6/0x120
[&lt;ffffffff815b4d3e&gt;] SyS_bind+0xe/0x10
[&lt;ffffffff816b031a&gt;] system_call_fastpath+0x18/0xd4

To avoid these excessive calls to rdma_bind_addr(), we optimize
rds_ib_laddr_check() by simply checking if the address in question has
been used before. The rds_rdma module keeps track of addresses
associated with IB devices, and the function rds_ib_get_device() is
used to determine if the address already has been qualified as a valid
local address. If not found, we call the legacy rds_ib_laddr_check(),
now renamed to rds_ib_laddr_check_cm().

Signed-off-by: Håkon Bugge &lt;haakon.bugge@oracle.com&gt;
Signed-off-by: Somasundaram Krishnasamy &lt;somasundaram.krishnasamy@oracle.com&gt;
Signed-off-by: Gerd Rausch &lt;gerd.rausch@oracle.com&gt;
Signed-off-by: Allison Henderson &lt;achender@kernel.org&gt;
Link: https://patch.msgid.link/20260408080420.540032-2-achender@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>rds: ib: reject FRMR registration before IB connection is established</title>
<updated>2026-04-02T00:52:40Z</updated>
<author>
<name>Weiming Shi</name>
<email>bestswngs@gmail.com</email>
</author>
<published>2026-03-30T16:32:38Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=a54ecccfae62c5c85259ae5ea5d9c20009519049'/>
<id>urn:sha1:a54ecccfae62c5c85259ae5ea5d9c20009519049</id>
<content type='text'>
rds_ib_get_mr() extracts the rds_ib_connection from conn-&gt;c_transport_data
and passes it to rds_ib_reg_frmr() for FRWR memory registration. On a
fresh outgoing connection, ic is allocated in rds_ib_conn_alloc() with
i_cm_id = NULL because the connection worker has not yet called
rds_ib_conn_path_connect() to create the rdma_cm_id. When sendmsg() with
RDS_CMSG_RDMA_MAP is called on such a connection, the sendmsg path parses
the control message before any connection establishment, allowing
rds_ib_post_reg_frmr() to dereference ic-&gt;i_cm_id-&gt;qp and crash the
kernel.

The existing guard in rds_ib_reg_frmr() only checks for !ic (added in
commit 9e630bcb7701), which does not catch this case since ic is allocated
early and is always non-NULL once the connection object exists.

 KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
 RIP: 0010:rds_ib_post_reg_frmr+0x50e/0x920
 Call Trace:
  rds_ib_post_reg_frmr (net/rds/ib_frmr.c:167)
  rds_ib_map_frmr (net/rds/ib_frmr.c:252)
  rds_ib_reg_frmr (net/rds/ib_frmr.c:430)
  rds_ib_get_mr (net/rds/ib_rdma.c:615)
  __rds_rdma_map (net/rds/rdma.c:295)
  rds_cmsg_rdma_map (net/rds/rdma.c:860)
  rds_sendmsg (net/rds/send.c:1363)
  ____sys_sendmsg
  do_syscall_64

Add a check in rds_ib_get_mr() that verifies ic, i_cm_id, and qp are all
non-NULL before proceeding with FRMR registration, mirroring the guard
already present in rds_ib_post_inv(). Return -ENODEV when the connection
is not ready, which the existing error handling in rds_cmsg_send() converts
to -EAGAIN for userspace retry and triggers rds_conn_connect_if_down() to
start the connection worker.

Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Reported-by: Xiang Mei &lt;xmei5@asu.edu&gt;
Signed-off-by: Weiming Shi &lt;bestswngs@gmail.com&gt;
Reviewed-by: Allison Henderson &lt;achender@kernel.org&gt;
Link: https://patch.msgid.link/20260330163237.2752440-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>urn:sha1:bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28Z</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>urn:sha1:69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: WQ_PERCPU added to alloc_workqueue users</title>
<updated>2025-09-23T00:40:30Z</updated>
<author>
<name>Marco Crivellari</name>
<email>marco.crivellari@suse.com</email>
</author>
<published>2025-09-18T14:24:27Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=27ce71e1ce81875df72f7698ba27988392bef602'/>
<id>urn:sha1:27ce71e1ce81875df72f7698ba27988392bef602</id>
<content type='text'>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag at the network subsystem, to explicitly
request the use of the per-CPU behavior. Both flags coexist for one release
cycle to allow callers to transition their calls.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

All existing users have been updated accordingly.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Link: https://patch.msgid.link/20250918142427.309519-4-marco.crivellari@suse.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/rds: remove unused struct 'rds_ib_dereg_odp_mr'</title>
<updated>2024-10-03T23:42:52Z</updated>
<author>
<name>Dr. David Alan Gilbert</name>
<email>linux@treblig.org</email>
</author>
<published>2024-09-30T13:43:58Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=25ba2a5adab2f4e660be631b50f64b7ea218af33'/>
<id>urn:sha1:25ba2a5adab2f4e660be631b50f64b7ea218af33</id>
<content type='text'>
'rds_ib_dereg_odp_mr' has been unused since the original
commit 2eafa1746f17 ("net/rds: Handle ODP mr
registration/unregistration").

Remove it.

Signed-off-by: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Reviewed-by: Zhu Yanjun &lt;yanjun.zhu@linux.dev&gt;
Link: https://patch.msgid.link/20240930134358.48647-1-linux@treblig.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/rds: Remove FMR support for memory registration</title>
<updated>2020-06-02T23:32:53Z</updated>
<author>
<name>Max Gurtovoy</name>
<email>maxg@mellanox.com</email>
</author>
<published>2020-05-28T19:45:45Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=07549ee21ce5247143ffb069bf838025d86b908c'/>
<id>urn:sha1:07549ee21ce5247143ffb069bf838025d86b908c</id>
<content type='text'>
Use FRWR method for memory registration by default and remove the ancient
and unsafe FMR method.

Link: https://lore.kernel.org/r/3-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>net/rds: Use prefetch for On-Demand-Paging MR</title>
<updated>2020-01-18T09:48:19Z</updated>
<author>
<name>Hans Westgaard Ry</name>
<email>hans.westgaard.ry@oracle.com</email>
</author>
<published>2020-01-15T12:43:40Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=b2dfc6765e45a3154800333234e4952b5412d792'/>
<id>urn:sha1:b2dfc6765e45a3154800333234e4952b5412d792</id>
<content type='text'>
Try prefetching pages when using On-Demand-Paging MR using
ib_advise_mr.

Signed-off-by: Hans Westgaard Ry &lt;hans.westgaard.ry@oracle.com&gt;
Acked-by: Santosh Shilimkar &lt;santosh.shilimkar@oracle.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
</content>
</entry>
<entry>
<title>net/rds: Handle ODP mr registration/unregistration</title>
<updated>2020-01-18T09:48:19Z</updated>
<author>
<name>Hans Westgaard Ry</name>
<email>hans.westgaard.ry@oracle.com</email>
</author>
<published>2020-01-15T12:43:39Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=2eafa1746f17872483d1033b0116ec71435ea19d'/>
<id>urn:sha1:2eafa1746f17872483d1033b0116ec71435ea19d</id>
<content type='text'>
On-Demand-Paging MRs are registered using ib_reg_user_mr and
unregistered with ib_dereg_mr.

Signed-off-by: Hans Westgaard Ry &lt;hans.westgaard.ry@oracle.com&gt;
Acked-by: Santosh Shilimkar &lt;santosh.shilimkar@oracle.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
</content>
</entry>
<entry>
<title>net/rds: Fix NULL/ERR_PTR inconsistency</title>
<updated>2019-07-17T19:06:52Z</updated>
<author>
<name>Gerd Rausch</name>
<email>gerd.rausch@oracle.com</email>
</author>
<published>2019-07-16T22:29:07Z</published>
<link rel='alternate' type='text/html' href='http://git.waynecole.info/wireguard-linux/commit/?id=aea01a2234d26ffa9d9ee01e43705824c0c7b08a'/>
<id>urn:sha1:aea01a2234d26ffa9d9ee01e43705824c0c7b08a</id>
<content type='text'>
Make function "rds_ib_try_reuse_ibmr" return NULL in case
memory region could not be allocated, since callers
simply check if the return value is not NULL.

Signed-off-by: Gerd Rausch &lt;gerd.rausch@oracle.com&gt;
Acked-by: Santosh Shilimkar &lt;santosh.shilimkar@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
