Age | Commit message (Collapse) | Author |
|
Pull io_uring fixes from Jens Axboe:
- Removed function that became unused after last week's merge (Jiapeng)
- Two small fixes for kbuf recycling (Pavel)
- Include address copy for zc send for POLLFIRST (Pavel)
- Fix for short IO handling in the normal read/write path (Pavel)
* tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-block:
io_uring/rw: fix short rw error handling
io_uring/net: copy addr for zc on POLL_FIRST
io_uring: recycle kbuf recycle on tw requeue
io_uring/kbuf: fix not advancing READV kbuf ring
io_uring/notif: Remove the unused function io_notif_complete()
|
|
We have a couple of problems, first reports of unexpected link breakage
for reads when cqe->res indicates that the IO was done in full. The
reason here is partial IO with retries.
TL;DR; we compare the result in __io_complete_rw_common() against
req->cqe.res, but req->cqe.res doesn't store the full length but rather
the length left to be done. So, when we pass the full corrected result
via kiocb_done() -> __io_complete_rw_common(), it fails.
The second problem is that we don't try to correct res in
io_complete_rw(), which, for instance, might be a problem for O_DIRECT
but when a prefix of data was cached in the page cache. We also
definitely don't want to pass a corrected result into io_rw_done().
The fix here is to leave __io_complete_rw_common() alone, always pass
not corrected result into it and fix it up as the last step just before
actually finishing the I/O.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://github.com/axboe/liburing/issues/643
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Every time we return from an issue handler and expect the request to be
retried we should also setup it for async exec ourselves. Do that when
we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll
re-read the address, which might be a surprise for the userspace.
Fixes: 092aeedb750a9 ("io_uring: allow to pass addr into sendzc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When we queue a request via tw for execution it's not going to be
executed immediately, so when io_queue_async() hits IO_APOLL_READY
and queues a tw but doesn't try to recycle/consume the buffer some other
request may try to use the the buffer.
Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a19bc9e211e3184215a58e129b62f440180e9212.1662480490.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When we don't recycle a selected ring buffer we should advance the head
of the ring, so don't just skip io_kbuf_recycle() for IORING_OP_READV
but adjust the ring.
Fixes: 934447a603b22 ("io_uring: do not recycle buffer in READV")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/a6d85e2611471bcb5d5dcd63a8342077ddc2d73d.1662480490.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The function io_notif_complete() is defined in the notif.c file, but not
called elsewhere, so delete this unused function.
io_uring/notif.c:24:20: warning: unused function 'io_notif_complete' [-Wunused-function].
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2047
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220905020436.51894-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull io_uring fixes from Jens Axboe:
- A single fix for over-eager retries for networking (Pavel)
- Revert the notification slot support for zerocopy sends.
It turns out that even after more than a year or development and
testing, there's not full agreement on whether just using plain
ordered notifications is Good Enough to avoid the complexity of using
the notifications slots. Because of that, we decided that it's best
left to a future final decision.
We can always bring back this feature, but we can't really change it
or remove it once we've released 6.0 with it enabled. The reverts
leave the usual CQE notifications as the primary interface for
knowing when data was sent, and when it was acked. (Pavel)
* tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block:
selftests/net: return back io_uring zc send tests
io_uring/net: simplify zerocopy send user API
io_uring/notif: remove notif registration
Revert "io_uring: rename IORING_OP_FILES_UPDATE"
Revert "io_uring: add zc notification flush requests"
selftests/net: temporarily disable io_uring zc test
io_uring/net: fix overexcessive retries
|
|
Following user feedback, this patch simplifies zerocopy send API. One of
the main complaints is that the current API is difficult with the
userspace managing notification slots, and then send retries with error
handling make it even worse.
Instead of keeping notification slots change it to the per-request
notifications model, which posts both completion and notification CQEs
for each request when any data has been sent, and only one CQE if it
fails. All notification CQEs will have IORING_CQE_F_NOTIF set and
IORING_CQE_F_MORE in completion CQEs indicates whether to wait a
notification or not.
IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now.
This is less flexible, but greatly simplifies the user API and also the
kernel implementation. We reuse notif helpers in this patch, but in the
future there won't be need for keeping two requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're going to remove the userspace exposed zerocopy notification API,
remove notification registration.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit 4379d5f15b3fd4224c37841029178aa8082a242e.
We removed notification flushing, also cleanup uapi preparation changes
to not pollute it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/89edc3905350f91e1b6e26d9dbf42ee44fd451a2.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit 492dddb4f6e3a5839c27d41ff1fecdbe6c3ab851.
Soon we won't have the very notion of notification flushing, so remove
notification flushing requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8850334ca56e65b413cb34fd158db81d7b2865a3.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM support for IORING_OP_URING_CMD from Paul Moore:
"Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD.
These are necessary as without them the IORING_OP_URING_CMD remains
outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch,
and my SELinux patch). They have been discussed at length with the
io_uring folks, and Jens has given his thumbs-up on the relevant
patches (see the commit descriptions).
There is one patch that is not strictly necessary, but it makes
testing much easier and is very trivial: the /dev/null
IORING_OP_URING_CMD patch."
* tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
Smack: Provide read control for io_uring_cmd
/dev/null: add IORING_OP_URING_CMD support
selinux: implement the security_uring_cmd() LSM hook
lsm,io_uring: add LSM hooks for the new uring_cmd file op
|
|
Length parameter of io_sg_from_iter() can be smaller than the iterator's
size, as it's with TCP, so when we set from->count at the end of the
function we truncate the iterator forcing TCP to return preliminary with
a short send. It affects zerocopy sends with large payload sizes and
leads to retries and possible request failures.
Fixes: 3ff1a0d395c00 ("io_uring: enable managed frags with register buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io-uring cmd support was added through ee692a21e9bf ("fs,io_uring:
add infrastructure for uring-cmd"), this extended the struct
file_operations to allow a new command which each subsystem can use
to enable command passthrough. Add an LSM specific for the command
passthrough which enables LSMs to inspect the command details.
This was discussed long ago without no clear pointer for something
conclusive, so this enables LSMs to at least reject this new file
operation.
[0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com
Cc: stable@vger.kernel.org
Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
We usually copy all bits that a request needs from the userspace for
async execution, so the userspace can keep them on the stack. However,
send zerocopy violates this pattern for addresses and may reloads it
e.g. from io-wq. Save the address if any in ->async_data as usual.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com
[axboe: fold in incremental fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are opcodes that need ->async_data only in some cases and
allocation it unconditionally may hurt performance. Add an option to
opdef to make move the allocation part from the core io_uring to opcode
specific code.
Note, we can't just set opdef->async_size to zero because there are
other helpers that rely on it, e.g. io_alloc_async_data().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently, there is no ordering between notification CQEs and
completions of the send flushing it, this quite complicates the
userspace, especially since we don't flush notification when the
send(+flush) request fails, i.e. there will be only one CQE. What we
can do is to make sure that notification completions come only after
sends.
The easiest way to achieve this is to not try to complete a notification
inline from io_sendzc() but defer it to task_work, considering that
io-wq sendzc is disallowed CQEs will be naturally ordered because
task_works will only be executed after we're done with submission and so
inline completion.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cddfd1c2bf91f22b9fe08e13b7dffdd8f858a151.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix up indentation before we get complaints from tooling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Failed requests should be marked with req_set_fail(), so links and cqe
skipping work correctly, which is missing in io_sendzc(). Note,
io_sendzc() return IOU_OK on failure, so the core code won't do the
cleanup for us.
Fixes: 06a5464be84e4 ("io_uring: wire send zc request type")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix up the io_alloc_notif()'s __must_hold as we don't have a ctx
argument there but should get it from the slot instead.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cbb0a920f18e0aed590bf58300af817b9befb8a3.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If ->uring_cmd returned an error value different from -EAGAIN or
-EIOCBQUEUED, it gets overridden with IOU_OK. This invites trouble
as caller (io_uring core code) handles IOU_OK differently than other
error codes.
Fix this by returning the actual error code.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The passed in index should be validated against the number of registered
files we have, it needs to be smaller than the index value to avoid going
one beyond the end.
Fixes: 78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()")
Reported-by: Luo Likang <luolikang@nsfocus.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is another spot where we check ->async_data directly instead of
using req_has_async_data(), which is the way to do it, fix it up.
Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
1024 notification slots is rather an arbitrary value, raise it up,
everything is accounted to memcg.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/eb78a0a5f2fa5941f8e845cdae5fb399bf7ba0be.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We may account memory to a memcg of a request that didn't even got to
the network layer. It's not a bug as it'll be routinely cleaned up on
flush, but it might be confusing for the userspace.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8aae61f4c3ddc4da97c1da876bb73871f352d50.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have a helper that checks for whether a request contains anything in
->async_data or not, namely req_has_async_data(). It's better to use it
as it might have some extra considerations.
Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7414da4e7c3c32c31fc02dfd1355af4ccf4ca5f.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull io_uring fixes from Jens Axboe:
- Regression fix for this merge window, fixing a wrong order of
arguments for io_req_set_res() for passthru (Dylan)
- Fix for the audit code leaking context memory (Peilin)
- Ensure that provided buffers are memcg accounted (Pavel)
- Correctly handle short zero-copy sends (Pavel)
- Sparse warning fixes for the recvmsg multishot command (Dylan)
- Error handling fix for passthru (Anuj)
- Remove randomization of struct kiocb fields, to avoid it growing in
size if re-arranged in such a fashion that it grows more holes or
padding (Keith, Linus)
- Small series improving type safety of the sqe fields (Stefan)
* tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block:
io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields
io_uring: make io_kiocb_to_cmd() typesafe
fs: don't randomize struct kiocb fields
io_uring: consistently make use of io_notif_to_data()
io_uring: fix error handling for io_uring_cmd
io_uring: fix io_recvmsg_prep_multishot sparse warnings
io_uring/net: send retry for zerocopy
io_uring: mem-account pbuf buckets
audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()
io_uring: pass correct parameters to io_req_set_res
|
|
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/ffcaf8dc4778db4af673822df60dbda6efdd3065.1660201408.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This makes the assignment typesafe. It prepares
changing io_kiocb_to_cmd() in the next commit.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/8da6e9d12cf95ad4bc73274406d12bca7aabf72e.1660201408.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 97b388d70b53 ("io_uring: handle completions in the core") moved the
error handling from handler to core. But for io_uring_cmd handler we end
up completing more than once (both in handler and in core) leading to
use_after_free.
Change io_uring_cmd handler to avoid calling io_uring_cmd_done in case
of error.
Fixes: 97b388d70b53 ("io_uring: handle completions in the core")
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220811091459.6929-1-anuj20.g@samsung.com
[axboe: fix ret vs req typo]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix casts missing the __user parts. This seemed to only cause errors on
the alpha build, or if checked with sparse, but it was definitely an
oversight.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220805115450.3921352-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_uring handles short sends/recvs for stream sockets when MSG_WAITALL
is set, however new zerocopy send is inconsistent in this regard, which
might be confusing. Handle short sends.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b876a4838597d9bba4f3215db60d72c33c448ad0.1659622472.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Potentially, someone may create as many pbuf bucket as there are indexes
in an xarray without any other restrictions bounding our memory usage,
put memory needed for the buckets under memory accounting.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d34c452e45793e978d26e2606211ec9070d329ea.1659622312.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently @audit_context is allocated twice for io_uring workers:
1. copy_process() calls audit_alloc();
2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which
is effectively audit_alloc()) and overwrites @audit_context,
causing:
BUG: memory leak
unreferenced object 0xffff888144547400 (size 1024):
<...>
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8135cfc3>] audit_alloc+0x133/0x210
[<ffffffff81239e63>] copy_process+0xcd3/0x2340
[<ffffffff8123b5f3>] create_io_thread+0x63/0x90
[<ffffffff81686604>] create_io_worker+0xb4/0x230
[<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0
[<ffffffff8167663a>] io_queue_iowq+0xba/0x200
[<ffffffff816768b3>] io_queue_async+0x113/0x180
[<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0
[<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120
[<ffffffff8167d49f>] tctx_task_work+0x11f/0x570
[<ffffffff81272c4e>] task_work_run+0x7e/0xc0
[<ffffffff8125a688>] get_signal+0xc18/0xf10
[<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730
[<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180
[<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20
[<ffffffff844a7e80>] do_syscall_64+0x40/0x80
Then,
3. io_sq_thread() or io_wqe_worker() frees @audit_context using
audit_free();
4. do_exit() eventually calls audit_free() again, which is okay
because audit_free() does a NULL check.
As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and
redundant audit_free() calls.
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Suggested-by: Paul Moore <paul@paul-moore.com>
Cc: stable@vger.kernel.org
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs iov_iter updates from Al Viro:
"Part 1 - isolated cleanups and optimizations.
One of the goals is to reduce the overhead of using ->read_iter() and
->write_iter() instead of ->read()/->write().
new_sync_{read,write}() has a surprising amount of overhead, in
particular inside iocb_flags(). That's the explanation for the
beginning of the series is in this pile; it's not directly
iov_iter-related, but it's a part of the same work..."
* tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
first_iovec_segment(): just return address
iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
iov_iter: first_{iovec,bvec}_segment() - simplify a bit
iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
iov_iter_bvec_advance(): don't bother with bvec_iter
copy_page_{to,from}_iter(): switch iovec variants to generic
keep iocb_flags() result cached in struct file
iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
struct file: use anonymous union member for rcuhead and llist
btrfs: use IOMAP_DIO_NOSYNC
teach iomap_dio_rw() to suppress dsync
No need of likely/unlikely on calls of check_copy_size()
|
|
The two parameters of 'res' and 'cflags' are swapped, so fix it.
Without this fix, 'ublk del' hangs forever.
Cc: Pavel Begunkov <asml.silence@gmail.com>
Fixes: de23077eda61f ("io_uring: set completion results upfront")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220803120757.1668278-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.dk/linux-block
Pull io_uring zerocopy support from Jens Axboe:
"This adds support for efficient support for zerocopy sends through
io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and
UDP.
The core network changes to support this is in a stable branch from
Jakub that both io_uring and net-next has pulled in, and the io_uring
changes are layered on top of that.
All of the work has been done by Pavel"
* tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits)
io_uring: notification completion optimisation
io_uring: export req alloc from core
io_uring/net: use unsigned for flags
io_uring/net: make page accounting more consistent
io_uring/net: checks errors of zc mem accounting
io_uring/net: improve io_get_notif_slot types
selftests/io_uring: test zerocopy send
io_uring: enable managed frags with register buffers
io_uring: add zc notification flush requests
io_uring: rename IORING_OP_FILES_UPDATE
io_uring: flush notifiers after sendzc
io_uring: sendzc with fixed buffers
io_uring: allow to pass addr into sendzc
io_uring: account locked pages for non-fixed zc
io_uring: wire send zc request type
io_uring: add notification slot registration
io_uring: add rsrc referencing for notifiers
io_uring: complete notifiers in tw
io_uring: cache struct io_notif
io_uring: add zc notification infrastructure
...
|
|
We want to use all optimisations that we have for io_uring requests like
completion batching, memory caching and more but for zc notifications.
Fortunately, notification perfectly fit the request model so we can
overlay them onto struct io_kiocb and use all the infratructure.
Most of the fields of struct io_notif natively fits into io_kiocb, so we
replace struct io_notif with struct io_kiocb carrying struct
io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to
use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand
coded caching. __io_notif_complete_tw() is converted to use io_uring's
tw infra.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We want to do request allocation out of the core io_uring code, make the
allocation functions public for other io_uring parts.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0314fedd3a02a514210ba42d4720332538c65956.1658913593.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use unsigned int type for msg flags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5cfaed13d3191337b14b8664ca68b515d9e2d1b4.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make network page accounting more consistent with how buffer
registration is working, i.e. account all memory to ctx->user.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4aacfe64bbb81b27f9ecf5d5c219c69a07e5aa56.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
mm_account_pinned_pages() may fail, don't ignore the return value.
Fixes: e29e3bd4b968 ("io_uring: account locked pages for non-fixed zc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae0542ed8e6706071bb83ad3e7ad6a70d207fd9.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't use signed int for slot indexing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e4d15aefebb5e55729dd9b5ec01ab16b70033343.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_uring's registered buffers infra has a good performant way of pinning
pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely
register buffer backed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Overlay notification control onto IORING_OP_RSRC_UPDATE (former
IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications
from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is
not zero, it also copies sqe->arg as a new tag for all flushed
notifications.
Note, it doesn't flush a notification of a slot if there was no requests
attached to it (since last flush or registration).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df13e2363400682a73dd9e71c3b990b8d1ff0333.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
IORING_OP_FILES_UPDATE will be a more generic opcode serving different
resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype
handling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a907133907d9af3415a8a7aa1802c6aa97c03c6.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Allow to flush notifiers as a part of sendzc request by setting
IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will
flush the used [active] notifier.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Allow zerocopy sends to use fixed buffers. There is an optimisation for
this case, the network layer don't need to reference the pages, see
SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed
buffers until the notifier is released.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com
[axboe: fold in 32-bit pointer cast warning fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Allow to specify an address to zerocopy sends making it more like
sendto(2).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|