aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-02block: ensure bio_iov_add_page can't failKeith Busch
Adding the page could fail on the bio_full() condition, which checks for either exceeding the bio's max segments or total size exceeding UINT_MAX. We already ensure the max segments can't be exceeded, so just ensure the total size won't reach the limit. This simplifies error handling and removes unnecessary repeated bio_full() checks. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220712153256.2202024-2-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: ensure iov_iter advances for added pagesKeith Busch
There are cases where a bio may not accept additional pages, and the iov needs to advance to the last data length that was accepted. The zone append used to handle this correctly, but was inadvertently broken when the setup was made common with the normal r/w case. Fixes: 576ed9135489c ("block: use bio_add_page in bio_iov_iter_get_pages") Fixes: c58c0074c54c2 ("block/bio: remove duplicate append pages code") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220712153256.2202024-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02drivers:md:fix a potential use-after-free bugWentao_Liang
In line 2884, "raid5_release_stripe(sh);" drops the reference to sh and may cause sh to be released. However, sh is subsequently used in lines 2886 "if (sh->batch_head && sh != sh->batch_head)". This may result in an use-after-free bug. It can be fixed by moving "raid5_release_stripe(sh);" to the bottom of the function. Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md/raid5: Ensure batch_last is released before sleeping for quiesceLogan Gunthorpe
A race condition exists where if raid5_quiesce() is called in the middle of a request that has set batch_last, it will deadlock. batch_last will hold a reference to a stripe when raid5_quiesce() is called. This will cause the next raid5_get_active_stripe() call to sleep waiting for the quiesce to finish, but the raid5_quiesce() thread will wait for active_stripes to go to zero which will never happen because request thread is waiting for the quiesce to stop. Fix this by creating a special __raid5_get_active_stripe() function which takes the request context and clears the last_batch before sleeping. While we're at it, change the arguments of raid5_get_active_stripe() to bools. Fixes: 3312e6c887fe ("md/raid5: Keep a reference to last stripe_head for batch") Reported-by: David Sloan <David.Sloan@eideticom.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md/raid5: Move stripe_request_ctx upLogan Gunthorpe
Move stripe_request_ctx up. No functional changes intended. This will be necessary in the next patch to release the batch_last in the context before sleeping. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage()Logan Gunthorpe
Now that raid5_get_active_stripe() has been refactored it is appearant that r5c_check_stripe_cache_usage() doesn't need to be called in the wait_for_stripe branch. r5c_check_stripe_cache_usage() will only conditionally call r5l_wake_reclaim(), but that function is called two lines later. Drop the call for cleanup. Reported-by: Martin Oliveira <martin.oliveira@eideticom.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md/raid5: Make is_inactive_blocked() helperLogan Gunthorpe
The logic to wait_for_stripe is difficult to parse being on so many lines and with confusing operator precedence. Move it to a helper function to make it easier to read. No functional changes intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md/raid5: Refactor raid5_get_active_stripe()Logan Gunthorpe
Refactor the raid5_get_active_stripe() to read more linearly in the order it's typically executed. The init_stripe() call is called if a free stripe is found and the function is exited early which removes a lot of if (sh) checks and unindents the following code. Remove the while loop in favour of the 'goto retry' pattern, which reduces indentation further. And use a 'goto wait_for_stripe' instead of an additional indent seeing it is the unusual path and this makes the code easier to read. No functional changes intended. Will make subsequent changes in patches easier to understand. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: pass struct queue_limits to the bio splitting helpersChristoph Hellwig
Allow using the splitting helpers on just a queue_limits instead of a full request_queue structure. This will eventually allow file systems or remapping drivers to split REQ_OP_ZONE_APPEND bios based on limits calculated as the minimum common capabilities over multiple devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: move bio_allowed_max_sectors to blk-merge.cChristoph Hellwig
Move this helper into the only file where it is used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: move the call to get_max_io_size out of blk_bio_segment_splitChristoph Hellwig
Prepare for reusing blk_bio_segment_split for (file system controlled) splits of REQ_OP_ZONE_APPEND bios by letting the caller control the maximum size of the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: move ->bio_split to the gendiskChristoph Hellwig
Only non-passthrough requests are split by the block layer and use the ->bio_split bio_set. Move it from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: change the blk_queue_bounce calling conventionChristoph Hellwig
The double indirect bio leads to somewhat suboptimal code generation. Instead return the (original or split) bio, and make sure the request_queue arguments to the lower level helpers is passed after the bio to avoid constant reshuffling of the argument passing registers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02block: change the blk_queue_split calling conventionChristoph Hellwig
The double indirect bio leads to somewhat suboptimal code generation. Instead return the (original or split) bio, and make sure the request_queue arguments to the lower level helpers is passed after the bio to avoid constant reshuffling of the argument passing registers. Also give it and the helpers used to implement it more descriptive names. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: update MAINTAINERS for the new auth codeChristoph Hellwig
Add the common subdirectory and match all nvme* headers in include/linux/. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet-tcp: fix lockdep complaint on nvmet_tcp_wq flush during queue teardownSagi Grimberg
We probably need nvmet_tcp_wq to have MEM_RECLAIM as we are sending/receiving for the socket from works on this workqueue. Also this eliminates lockdep complaints: -- [ 6174.010200] workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_tcp_release_queue_work [nvmet_tcp] is flushing !WQ_MEM_RECLAIM nvmet_tcp_wq:nvmet_tcp_io_work [nvmet_tcp] [ 6174.010216] WARNING: CPU: 20 PID: 14456 at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: enable generic interface (/dev/ngXnY) for unknown command setsJoel Granados
Extend nvme_alloc_ns() and nvme_validate_ns() for unknown command-set as well. Both are made to use a new helper (nvme_update_ns_info_cs_indep) which is similar to nvme_update_ns_info but performs fewer operations to get the generic interface up. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> [hch: rebased on other refactoring patches] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: factor out a nvme_ns_is_readonly helperChristoph Hellwig
Add a little helper to check if a namespace should be marked read-only that uses a new is_readonly flag in the nvme_ns_info structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: refactor namespace probingChristoph Hellwig
Change nvme_ns_scan to gather all information needed for generic namespace setup into a nvme_ns_info structure. This structure is filled from the Command Set Idependent Identify Namespace data structure if it is available or else the legacy Identify namespace structure. With that everything related to the NVM command set (and the ZNS command set derived from it) can be encapsulated in the nvme_update_ns_info_block function while keeping the rest of the namespace probing generic. The downside is that we now always issue two Identify Namespace calls for each probed namespace instead of usually just a single one previously. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: generalize the nvme_multi_css check in nvme_scan_nsChristoph Hellwig
Check for multiple command set support early on an error out if is not supported when a !NVM command set namespace is found. This prepares for adding command set independent passthrough support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: rename nvme_validate_or_alloc_ns to nvme_scan_nsChristoph Hellwig
This shorter name much better fits what this function does in the scanning process. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: catch -ENODEV from nvme_revalidate_zones againChristoph Hellwig
nvme_revalidate_zones can also return -ENODEV if e.g. zone sizes aren't constant or not a power of two. In that case we should jump to marking the gendisk hidden and only support pass through. Fixes: 602e57c9799c ("nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info") Reported-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet-auth: select the intended CRYPTO_DH_RFC7919_GROUPSLukas Bulwahn
Commit 71ebe3842ebe ("nvmet-auth: Diffie-Hellman key exchange support") intends to select 'Support for RFC 7919 FFDHE group parameters' for using FFDHE groups for NVMe In-Band Authentication. It however selects CRYPTO_DH_GROUPS_RFC7919, instead of the intended CRYPTO_DH_RFC7919_GROUPS; notice the swapping of words here. Correct the select to the intended config option. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet-auth: fix return value check in auth receiveChaitanya Kulkarni
nvmet_auth_challenge() return type is int and currently it uses status variable that is of type u16 in nvmet_execute_auth_receive(). Catch the return value of nvmet_auth_challenge() into int and set the NVME_SC_INTERNAL as status variable before we jump to error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet-auth: fix return value check in auth sendChaitanya Kulkarni
nvmet_setup_auth() return type is int and currently it uses status variable that is of type u16 in nvmet_execute_auth_send(). Catch the return value of nvmet_setup_auth() into int and set the NVME_SC_INTERNAL as status variable before we jump to error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet-auth: fix a couple of spelling mistakesColin Ian King
There are a couple of spelling mistakes in pr_warn and pr_debug messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet: fix a format specifier in nvmet_auth_ctrl_exponentialChristoph Hellwig
dh_keysize is a size_t, use the proper format specifier for printing it. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@sues.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvmet: don't check for NULL pointer before kfree in nvmet_host_releaseChristoph Hellwig
And add an empty line after the variable declaration. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-apple: stop casting function pointer signaturesChristoph Hellwig
Casting function pointers breaks control flow enforcement and is generally a horrible coding style. Add two wrappers to get rid of these casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-tcp: split nvme_tcp_alloc_tagsetChristoph Hellwig
Split nvme_tcp_alloc_tagset into one helper for the admin tag_set and one for the I/O tag set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-rdma: split nvme_rdma_alloc_tagsetChristoph Hellwig
Split nvme_rdma_alloc_tagset into one helper for the admin tag_set and one for the I/O tag set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-pci: split nvme_dev_addChristoph Hellwig
Split nvme_dev_add into a helper to actually allocate the tag set, and one that just update the number of queues. Add a local variable for the tag_set to clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-pci: split nvme_alloc_admin_tagsChristoph Hellwig
Split nvme_alloc_admin_tags into a helper to actually allocate the tag set, and one that just restarts the admin queue. Add a local variable for the tag_set to clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-pci: print the command name of aborted commandsChristoph Hellwig
To allow for slightly better debugging, print the command name when aborting an command. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-pci: remove useless assignment in nvme_pci_setup_prpsLiu Song
If prp_list is NULL, nvme_unmap_sg will be performed, and the assignment to first_dma is meaningless, so remove it. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-auth: uninitialized variable in nvme_auth_transform_key()Dan Carpenter
A couple of the early error gotos call kfree_sensitive(transformed_key); before "transformed_key" has been initialized. Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme-auth: fix off by one checksDan Carpenter
The > ARRAY_SIZE() checks need to be >= ARRAY_SIZE() to prevent reading one element beyond the end of the arrays. Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: define compat_ioctl again to unbreak 32-bit userspace.Nick Bowler
Commit 89b3d6e60550 ("nvme: simplify the compat ioctl handling") removed the initialization of compat_ioctl from the nvme block_device_operations structures. Presumably the expectation was that 32-bit ioctls would be directed through the regular handler but this is not the case: failing to assign .compat_ioctl actually means that the compat case is disabled entirely, and any attempt to submit nvme ioctls from 32-bit userspace fails outright with -ENOTTY. For example: % smartctl -x /dev/nvme0n1 [...] Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device The blkdev_compat_ptr_ioctl helper can be used to direct compat calls through the main ioctl handler and makes things work again. Fixes: 89b3d6e60550 ("nvme: simplify the compat ioctl handling") Signed-off-by: Nick Bowler <nbowler@draconx.ca> Reviewed-by: Guixin Liu <kanie@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: don't always build constants.oChristoph Hellwig
The entire content of constants.c if guarded by an ifdef, so switch to just building the file conditionally instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: use command_id instead of req->tag in trace_nvme_complete_rq()Bean Huo
Use command_id instead of req->tag in trace_nvme_complete_rq(), because of commit e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release"), cmd->common.command_id is set to ((genctl & 0xf)< 12 | req->tag), no longer req->tag, which makes cid in trace_nvme_complete_rq and trace_nvme_setup_cmd are not the same. Fixes: e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release") Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md-raid10: fix KASAN warningMikulas Patocka
There's a KASAN warning in raid10_remove_disk when running the lvm test lvconvert-raid-reshape.sh. We fix this warning by verifying that the value "number" is valid. BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10] Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682 CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? raid10_remove_disk+0x61/0x2a0 [raid10] kasan_report+0xa8/0xe0 ? raid10_remove_disk+0x61/0x2a0 [raid10] raid10_remove_disk+0x61/0x2a0 [raid10] Buffer I/O error on dev dm-76, logical block 15344, async page read ? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0 remove_and_add_spares+0x367/0x8a0 [md_mod] ? super_written+0x1c0/0x1c0 [md_mod] ? mutex_trylock+0xac/0x120 ? _raw_spin_lock+0x72/0xc0 ? _raw_spin_lock_bh+0xc0/0xc0 md_check_recovery+0x848/0x960 [md_mod] raid10d+0xcf/0x3360 [raid10] ? sched_clock_cpu+0x185/0x1a0 ? rb_erase+0x4d4/0x620 ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid10_sync_request+0x36c0/0x36c0 [raid10] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? _raw_spin_unlock_irq+0x11/0x24 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 124495: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 setup_conf+0x140/0x5c0 [raid10] raid10_run+0x4cd/0x740 [raid10] md_run+0x6f9/0x1300 [md_mod] raid_ctr+0x2531/0x4ac0 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 L __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff889108f3d200 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 0 bytes to the right of 256-byte region [ffff889108f3d200, ffff889108f3d300) The buggy address belongs to the physical page: page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4000000000010200(slab|head|zone=2) raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md-raid: destroy the bitmap after destroying the threadMikulas Patocka
When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with kasan, we got failure in write_page. The reason for the failure is that md_bitmap_destroy is called before destroying the thread and the thread may be waiting in the function write_page for the bio to complete. When the thread finishes waiting, it executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which triggers the kasan warning. Note that the commit 48df498daf62 that caused this bug claims that it is neede for md-cluster, you should check md-cluster and possibly find another bugfix for it. BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod] Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539 CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? write_page+0x18d/0x680 [md_mod] kasan_report+0xa8/0xe0 ? write_page+0x18d/0x680 [md_mod] kasan_check_range+0x13f/0x180 write_page+0x18d/0x680 [md_mod] ? super_sync+0x4d5/0x560 [dm_raid] ? md_bitmap_file_kick+0xa0/0xa0 [md_mod] ? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid] ? mutex_trylock+0x120/0x120 ? preempt_count_add+0x6b/0xc0 ? preempt_count_sub+0xf/0xc0 md_update_sb+0x707/0xe40 [md_mod] md_reap_sync_thread+0x1b2/0x4a0 [md_mod] md_check_recovery+0x533/0x960 [md_mod] raid1d+0xc8/0x2a20 [raid1] ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid1_end_read_request+0x2a0/0x2a0 [raid1] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 5522: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 md_bitmap_create+0xa8/0xe80 [md_mod] md_run+0x777/0x1300 [md_mod] raid_ctr+0x249c/0x4a30 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x40 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0xf7/0x140 kfree+0x80/0x240 md_bitmap_free+0x1c3/0x280 [md_mod] __md_stop+0x21/0x120 [md_mod] md_stop+0x9/0x40 [md_mod] raid_dtr+0x1b/0x40 [dm_raid] dm_table_destroy+0x98/0x1e0 [dm_mod] __dm_destroy+0x199/0x360 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md: return the allocated devices from md_allocChristoph Hellwig
Two callers of md_alloc want to use the newly allocated devices, so return it instead of letting them find it cumbersomely after the allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-and-tested-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md: open code md_probe in autorun_devicesChristoph Hellwig
autorun_devices should not be limited to the controls for the legacy probe on open, so just call md_alloc directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-and-tested-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md: remove unneeded semicolonYang Li
Eliminate the following coccicheck warning: ./drivers/md/md.c:8208:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02remove the sx8 block driverChristoph Hellwig
This driver is for fairly obscure hardware, and has only seen random drive-by changes after the maintainer stopped working on it in 2005 (about a year and a half after it was introduced). It has some "interesting" block layer interactions, so let's just drop it unless anyone complains. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220721064102.1715460-1-hch@lst.de [axboe: fix date typo, it was in 2005, not 2015] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md: fix build failure for !MODULEStephen Rothwell
After merging the block tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/md/md.c:717:22: error: 'mddev_find' defined but not used [-Werror=unused-function] 717 | static struct mddev *mddev_find(dev_t unit) | ^~~~~~~~~~ cc1: all warnings being treated as errors Caused by commit 4500d5c17910 ("md: simplify md_open") Make mddev_find() available only for non-modular builds. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220721131132.070be166@canb.auug.org.au Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02raid5: fix duplicate checks for rdev->saved_raid_diskJackie Liu
'first' will always be greater than or equal to 0, it is unnecessary to repeat the 0 check, clean it up. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md: simplify md_openChristoph Hellwig
Now that devices are on the all_mddevs list until the gendisk is freed, there can't be any duplicates. Remove the global list lookup and just grab a reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02md: only delete entries from all_mddevs when the disk is freedChristoph Hellwig
This ensures device names don't get prematurely reused. Instead add a deleted flag to skip already deleted devices in mddev_get and other places that only want to see live mddevs. Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>