aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)Author
2023-12-13RDMA/rtrs-clt: Remove the warnings for req in_use checkJack Wang
[ Upstream commit 0c8bb6eb70ca41031f663b4481aac9ac78b53bc6 ] As we chain the WR during write request: memory registration, rdma write, local invalidate, if only the last WR fail to send due to send queue overrun, the server can send back the reply, while client mark the req->in_use to false in case of error in rtrs_clt_req when error out from rtrs_post_rdma_write_sg. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13RDMA/rtrs-clt: Fix the max_send_wr settingJack Wang
[ Upstream commit 6d09f6f7d7584e099633282ea915988914f86529 ] For each write request, we need Request, Response Memory Registration, Local Invalidate. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flightMd Haris Iqbal
[ Upstream commit c4d32e77fc1006f99eeb78417efc3d81a384072a ] Destroying path files may lead to the freeing of rdma_stats. This creates the following race. An IO is in-flight, or has just passed the session state check in process_read/process_write. The close_work gets triggered and the function rtrs_srv_close_work() starts and does destroy path which frees the rdma_stats. After this the function process_read/process_write resumes and tries to update the stats through the function rtrs_srv_update_rdma_stats This commit solves the problem by moving the destroy path function to a later point. This point makes sure any inflights are completed. This is done by qp drain, and waiting for all in-flights through ops_id. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is trueMd Haris Iqbal
[ Upstream commit 3a71cd6ca0ce33d1af019ecf1d7167406fa54400 ] Since srv_mr->iu is allocated and used only when always_invalidate is true, free it only when always_invalidate is true. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13RDMA/rtrs-srv: Check return values while processing info requestMd Haris Iqbal
[ Upstream commit ed1e52aefa16f15dc2f04054a3baf11726a7460e ] While processing info request, it could so happen that the srv_path goes to CLOSING state, cause of any of the error events from RDMA. That state change should be picked up while trying to change the state in process_info_req, by checking the return value. In case the state change call in process_info_req fails, we fail the processing. We should also check the return value for rtrs_srv_path_up, since it sends a link event to the client above, and the client can fail for any reason. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13RDMA/rtrs-clt: Start hb after path_upJack Wang
[ Upstream commit 3e44a61b5db873612e20e7b7922468d7d1ac2d22 ] If we start hb too early, it will confuse server side to close the session. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13RDMA/rtrs-srv: Do not unconditionally enable irqJack Wang
[ Upstream commit 3ee7ecd712048ade6482bea4b2f3dcaf039c0348 ] When IO is completed, rtrs can be called in softirq context, unconditionally enabling irq could cause panic. To be on safe side, use spin_lock_irqsave and spin_unlock_irqrestore instread. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10RDMA/srp: Do not call scsi_done() from srp_abort()Bart Van Assche
commit e193b7955dfad68035b983a0011f4ef3590c85eb upstream. After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler callback, it performs one of the following actions: * Call scsi_queue_insert(). * Call scsi_finish_command(). * Call scsi_eh_scmd_add(). Hence, SCSI abort handlers must not call scsi_done(). Otherwise all the above actions would trigger a use-after-free. Hence remove the scsi_done() call from srp_abort(). Keep the srp_free_req() call before returning SUCCESS because we may not see the command again if SUCCESS is returned. Cc: Bob Pearson <rpearsonhpe@gmail.com> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: d8536670916a ("IB/srp: Avoid having aborted requests hang") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230823205727.505681-1-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13Revert "IB/isert: Fix incorrect release of isert connection"Leon Romanovsky
[ Upstream commit dfe261107c080709459c32695847eec96238852b ] Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is causing problems on OPA when DEVICE_REMOVAL is happening. ------------[ cut here ]------------ WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359 ib_cq_pool_cleanup+0xac/0xb0 [ib_core] Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015 RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core] Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83 c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206 RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640 RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18 R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38 FS: 00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0 Call Trace: <TASK> ? __warn+0x80/0x130 ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core] ? report_bug+0x195/0x1a0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core] disable_device+0x9d/0x160 [ib_core] __ib_unregister_device+0x42/0xb0 [ib_core] ib_unregister_device+0x22/0x30 [ib_core] rvt_unregister_device+0x20/0x90 [rdmavt] hfi1_unregister_ib_device+0x16/0xf0 [hfi1] remove_one+0x55/0x1a0 [hfi1] pci_device_remove+0x36/0xa0 device_release_driver_internal+0x193/0x200 driver_detach+0x44/0x90 bus_remove_driver+0x69/0xf0 pci_unregister_driver+0x2a/0xb0 hfi1_mod_cleanup+0xc/0x3c [hfi1] __do_sys_delete_module.constprop.0+0x17a/0x2f0 ? exit_to_user_mode_prepare+0xc4/0xd0 ? syscall_trace_enter.constprop.0+0x126/0x1a0 do_syscall_64+0x5c/0x90 ? syscall_exit_to_user_mode+0x12/0x30 ? do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x12/0x30 ? do_syscall_64+0x69/0x90 ? exc_page_fault+0x65/0x150 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff1e643f5ab Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8 RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000 R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8 R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8 </TASK> ---[ end trace 0000000000000000 ]--- And... restrack: ------------[ cut here ]------------ infiniband hfi1_0: BUG: RESTRACK detected leak of resources restrack: Kernel PD object allocated by ib_isert is not freed restrack: Kernel CQ object allocated by ib_core is not freed restrack: Kernel QP object allocated by rdma_cm is not freed restrack: ------------[ cut here ]------------ Fixes: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") Reported-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@cornelisnetworks.com Link: https://lore.kernel.org/r/a27982d3235005c58f6d321f3fad5eb6e1beaf9e.1692604607.git.leonro@nvidia.com Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13scsi: RDMA/srp: Fix residual handlingBart Van Assche
[ Upstream commit 89e637c19b2441aabc8dbf22a8745b932fd6996e ] Although the code for residual handling in the SRP initiator follows the SCSI documentation, that documentation has never been correct. Because scsi_finish_command() starts from the data buffer length and subtracts the residual, scsi_set_resid() must not be called if a residual overflow occurs. Hence remove the scsi_set_resid() calls from the SRP initiator if a residual overflow occurrs. Cc: Leon Romanovsky <leon@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Fixes: 9237f04e12cc ("scsi: core: Fix scsi_get/set_resid() interface") Fixes: e714531a349f ("IB/srp: Fix residual handling") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230724200843.3376570-3-bvanassche@acm.org Acked-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21IB/isert: Fix incorrect release of isert connectionSaravanan Vajravel
[ Upstream commit 699826f4e30ab76a62c238c86fbef7e826639c8d ] The ib_isert module is releasing the isert connection both in isert_wait_conn() handler as well as isert_free_conn() handler. In isert_wait_conn() handler, it is expected to wait for iSCSI session logout operation to complete. It should free the isert connection only in isert_free_conn() handler. When a bunch of iSER target is cleared, this issue can lead to use-after-free memory issue as isert conn is twice released Fixes: b02efbfc9a05 ("iser-target: Fix implicit termination of connections") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-4-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21IB/isert: Fix possible list corruption in CMA handlerSaravanan Vajravel
[ Upstream commit 7651e2d6c5b359a28c2d4c904fec6608d1021ca8 ] When ib_isert module receives connection error event, it is releasing the isert session and removes corresponding list node but it doesn't take appropriate mutex lock to remove the list node. This can lead to linked list corruption Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-3-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21IB/isert: Fix dead lock in ib_isertSaravanan Vajravel
[ Upstream commit 691b0480933f0ce88a81ed1d1a0aff340ff6293a ] - When a iSER session is released, ib_isert module is taking a mutex lock and releasing all pending connections. As part of this, ib_isert is destroying rdma cm_id. To destroy cm_id, rdma_cm module is sending CM events to CMA handler of ib_isert. This handler is taking same mutex lock. Hence it leads to deadlock between ib_isert & rdma_cm modules. - For fix, created local list of pending connections and release the connection outside of mutex lock. Calltrace: --------- [ 1229.791410] INFO: task kworker/10:1:642 blocked for more than 120 seconds. [ 1229.791416] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 [ 1229.791418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1229.791419] task:kworker/10:1 state:D stack: 0 pid: 642 ppid: 2 flags:0x80004000 [ 1229.791424] Workqueue: ib_cm cm_work_handler [ib_cm] [ 1229.791436] Call Trace: [ 1229.791438] __schedule+0x2d1/0x830 [ 1229.791445] ? select_idle_sibling+0x23/0x6f0 [ 1229.791449] schedule+0x35/0xa0 [ 1229.791451] schedule_preempt_disabled+0xa/0x10 [ 1229.791453] __mutex_lock.isra.7+0x310/0x420 [ 1229.791456] ? select_task_rq_fair+0x351/0x990 [ 1229.791459] isert_cma_handler+0x224/0x330 [ib_isert] [ 1229.791463] ? ttwu_queue_wakelist+0x159/0x170 [ 1229.791466] cma_cm_event_handler+0x25/0xd0 [rdma_cm] [ 1229.791474] cma_ib_handler+0xa7/0x2e0 [rdma_cm] [ 1229.791478] cm_process_work+0x22/0xf0 [ib_cm] [ 1229.791483] cm_work_handler+0xf4/0xf30 [ib_cm] [ 1229.791487] ? move_linked_works+0x6e/0xa0 [ 1229.791490] process_one_work+0x1a7/0x360 [ 1229.791491] ? create_worker+0x1a0/0x1a0 [ 1229.791493] worker_thread+0x30/0x390 [ 1229.791494] ? create_worker+0x1a0/0x1a0 [ 1229.791495] kthread+0x10a/0x120 [ 1229.791497] ? set_kthread_struct+0x40/0x40 [ 1229.791499] ret_from_fork+0x1f/0x40 [ 1229.791739] INFO: task targetcli:28666 blocked for more than 120 seconds. [ 1229.791740] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 [ 1229.791741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1229.791742] task:targetcli state:D stack: 0 pid:28666 ppid: 5510 flags:0x00004080 [ 1229.791743] Call Trace: [ 1229.791744] __schedule+0x2d1/0x830 [ 1229.791746] schedule+0x35/0xa0 [ 1229.791748] schedule_preempt_disabled+0xa/0x10 [ 1229.791749] __mutex_lock.isra.7+0x310/0x420 [ 1229.791751] rdma_destroy_id+0x15/0x20 [rdma_cm] [ 1229.791755] isert_connect_release+0x115/0x130 [ib_isert] [ 1229.791757] isert_free_np+0x87/0x140 [ib_isert] [ 1229.791761] iscsit_del_np+0x74/0x120 [iscsi_target_mod] [ 1229.791776] lio_target_np_driver_store+0xe9/0x140 [iscsi_target_mod] [ 1229.791784] configfs_write_file+0xb2/0x110 [ 1229.791788] vfs_write+0xa5/0x1a0 [ 1229.791792] ksys_write+0x4f/0xb0 [ 1229.791794] do_syscall_64+0x5b/0x1a0 [ 1229.791798] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-2-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21RDMA/rtrs: Fix rxe_dealloc_pd warningLi Zhijian
[ Upstream commit 9c29c8c7df0688f358d2df5ddd16c97c2f7292b4 ] In current design: 1. PD and clt_path->s.dev are shared among connections. 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when clt_path->s.dev become zero, it will destroy PD. 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev So, in case create_cm(con[0]) succeeds but create_cm(con[1]) fails, destroy_con_cq_qp(con[1]) will be called first which will destroy the PD while this PD is still taken by con[0]. Here, we refactor the error path of create_cm() and init_conns(), so that we do the cleanup in the order they are created. The warning occurs when destroying RXE PD whose reference count is not zero. rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] ib_dealloc_pd_user+0x4b/0x80 [ib_core] rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] init_path+0x1e7/0x9a0 [rtrs_client] ? __pfx_autoremove_wake_function+0x10/0x10 ? lock_is_held_type+0xd7/0x130 ? rcu_read_lock_sched_held+0x43/0x80 ? pcpu_alloc+0x3dd/0x7d0 ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] rtrs_clt_open+0x24f/0x5a0 [rtrs_client] ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/1682384563-2-4-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21RDMA/rtrs: Fix the last iu->buf leak in err pathLi Zhijian
[ Upstream commit 3bf3a7c6985c625f64e73baefdaa36f1c2045a29 ] The last iu->buf will leak if ib_dma_mapping_error() fails. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/1682384563-2-3-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11RDMA/srpt: Add a check for valid 'mad_agent' pointerSaravanan Vajravel
[ Upstream commit eca5cd9474cd26d62f9756f536e2e656d3f62f3a ] When unregistering MAD agent, srpt module has a non-null check for 'mad_agent' pointer before invoking ib_unregister_mad_agent(). This check can pass if 'mad_agent' variable holds an error value. The 'mad_agent' can have an error value for a short window when srpt_add_one() and srpt_remove_one() is executed simultaneously. In srpt module, added a valid pointer check for 'sport->mad_agent' before unregistering MAD agent. This issue can hit when RoCE driver unregisters ib_device Stack Trace: ------------ BUG: kernel NULL pointer dereference, address: 000000000000004d PGD 145003067 P4D 145003067 PUD 2324fe067 PMD 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 10 PID: 4459 Comm: kworker/u80:0 Kdump: loaded Tainted: P Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.5.4 01/13/2020 Workqueue: bnxt_re bnxt_re_task [bnxt_re] RIP: 0010:_raw_spin_lock_irqsave+0x19/0x40 Call Trace: ib_unregister_mad_agent+0x46/0x2f0 [ib_core] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready ? __schedule+0x20b/0x560 srpt_unregister_mad_agent+0x93/0xd0 [ib_srpt] srpt_remove_one+0x20/0x150 [ib_srpt] remove_client_context+0x88/0xd0 [ib_core] bond0: (slave p2p1): link status definitely up, 100000 Mbps full duplex disable_device+0x8a/0x160 [ib_core] bond0: active interface up! ? kernfs_name_hash+0x12/0x80 (NULL device *): Bonding Info Received: rdev: 000000006c0b8247 __ib_unregister_device+0x42/0xb0 [ib_core] (NULL device *): Master: mode: 4 num_slaves:2 ib_unregister_device+0x22/0x30 [ib_core] (NULL device *): Slave: id: 105069936 name:p2p1 link:0 state:0 bnxt_re_stopqps_and_ib_uninit+0x83/0x90 [bnxt_re] bnxt_re_alloc_lag+0x12e/0x4e0 [bnxt_re] Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://lore.kernel.org/r/20230406042549.507328-1-saravanan.vajravel@broadcom.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11scsi: target: iscsit: isert: Alloc per conn cmd counterMike Christie
[ Upstream commit 6d256bee602b131bd4fbc92863b6a1210bcf6325 ] This has iscsit allocate a per conn cmd counter and converts iscsit/isert to use it instead of the per session one. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230319015620.96006-5-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 395cee83d02d ("scsi: target: iscsit: Stop/wait on cmds during conn close") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-14IB/IPoIB: Fix legacy IPoIB due to wrong number of queuesDragos Tatulea
[ Upstream commit e632291a2dbce45a24cddeb5fe28fe71d724ba43 ] The cited commit creates child PKEY interfaces over netlink will multiple tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx queues. This causes to a crash when traffic is sent over the PKEY interface due to the parent having a single queue but the child having multiple queues. This patch fixes the number of queues to 1 for legacy IPoIB at the earliest possible point in time. BUG: kernel NULL pointer dereference, address: 000000000000036b PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0xcb/0x450 Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_clone+0x55/0xd0 ip6_finish_output2+0x3fe/0x690 ip6_finish_output+0xfa/0x310 ip6_send_skb+0x1e/0x60 udp_v6_send_skb+0x1e5/0x420 udpv6_sendmsg+0xb3c/0xe60 ? ip_mc_finish_output+0x180/0x180 ? __switch_to_asm+0x3a/0x60 ? __switch_to_asm+0x34/0x60 sock_sendmsg+0x33/0x40 __sys_sendto+0x103/0x160 ? _copy_to_user+0x21/0x30 ? kvm_clock_get_cycles+0xd/0x10 ? ktime_get_ts64+0x49/0xe0 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f9374f1ed14 Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc </TASK> Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/95eb6b74c7cf49fa46281f9d056d685c9fa11d38.1674584576.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09use less confusing names for iov_iter direction initializersAl Viro
[ Upstream commit de4eda9de2d957ef2d6a8365a01e26a435e958cb ] READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: 6dd88fd59da8 ("vhost-scsi: unbreak any layout for response") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09READ is "data destination", not source...Al Viro
[ Upstream commit 355d2c2798e9dc39f6714fa7ef8902c0d4c5350b ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: 6dd88fd59da8 ("vhost-scsi: unbreak any layout for response") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-24RDMA/srp: Move large values to a new enum for gcc13Jiri Slaby (SUSE)
[ Upstream commit 56c5dab20a6391604df9521f812c01d1e3fe1bd0 ] Since gcc13, each member of an enum has the same type as the enum [1]. And that is inherited from its members. Provided these two: SRP_TAG_NO_REQ = ~0U, SRP_TAG_TSK_MGMT = 1U << 31 all other members are unsigned ints. Esp. with SRP_MAX_SGE and SRP_TSK_MGMT_SQ_SIZE and their use in min(), this results in the following warnings: include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast drivers/infiniband/ulp/srp/ib_srp.c:563:42: note: in expansion of macro 'min' include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast drivers/infiniband/ulp/srp/ib_srp.c:2369:27: note: in expansion of macro 'min' So move the large values away to a separate enum, so that they don't affect other members. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113 Link: https://lore.kernel.org/r/20221212120411.13750-1-jirislaby@kernel.org Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31IB/IPoIB: Fix queue count inconsistency for PKEY child interfacesDragos Tatulea
[ Upstream commit dbc94a0fb81771a38733c0e8f2ea8c4fa6934dc1 ] There are 2 ways to create IPoIB PKEY child interfaces: 1) Writing a PKEY to /sys/class/net/<ib parent interface>/create_child. 2) Using netlink with iproute. While with sysfs the child interface has the same number of tx and rx queues as the parent, with netlink there will always be 1 tx and 1 rx queue for the child interface. That's because the get_num_tx/rx_queues() netlink ops are missing and the default value of 1 is taken for the number of queues (in rtnl_create_link()). This change adds the get_num_tx/rx_queues() ops which allows for interfaces with multiple queues to be created over netlink. This constant only represents the max number of tx and rx queues on that net device. Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/f4a42c8aa43c02d5ae5559a60c3e5e0f18c82531.1670485816.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31RDMA/srp: Fix error return code in srp_parse_options()Wang Yufen
[ Upstream commit ed461b30b22c8fa85c25189c14cb89f29595cd14 ] In the previous iteration of the while loop, the "ret" may have been assigned a value of 0, so the error return code -EINVAL may have been incorrectly set to 0. To fix set valid return code before calling to goto. Also investigate each case separately as Andy suggessted. Fixes: e711f968c49c ("IB/srp: replace custom implementation of hex2bin()") Fixes: 2a174df0c602 ("IB/srp: Use kstrtoull() instead of simple_strtoull()") Fixes: 19f313438c77 ("IB/srp: Add RDMA/CM support") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1669953638-11747-2-git-send-email-wangyufen@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-06Merge tag 'v6.0' into rdma.git for-nextJason Gunthorpe
Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-22RDMA/srp: Support more than 255 rdma portsMikhael Goikhman
Currently ib_srp module does not support devices with more than 256 ports. Switch from u8 to u32 to fix the problem. Fixes: 1fb7f8973f51 ("RDMA: Support more than 255 rdma ports") Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Link: https://lore.kernel.org/r/7d80d8844f1abb3a54170b7259f0a02be38080a6.1663747327.git.leonro@nvidia.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/cma: Multiple path records support with netlink channelMark Zhang
Support receiving inbound and outbound IB path records (along with GMP PathRecord) from user-space service through the RDMA netlink channel. The LIDs in these 3 PRs can be used in this way: 1. GMP PR: used as the standard local/remote LIDs; 2. DLID of outbound PR: Used as the "dlid" field for outbound traffic; 3. DLID of inbound PR: Used as the "dlid" field for outbound traffic in responder side. This is aimed to support adaptive routing. With current IB routing solution when a packet goes out it's assigned with a fixed DLID per target, meaning a fixed router will be used. The LIDs in inbound/outbound path records can be used to identify group of routers that allow communication with another subnet's entity. With them packets from an inter-subnet connection may travel through any router in the set to reach the target. As confirmed with Jason, when sending a netlink request, kernel uses LS_RESOLVE_PATH_USE_ALL so that the service knows kernel supports multiple PRs. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/2fa2b6c93c4c16c8915bac3cfc4f27be1d60519d.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20RDMA/srpt: Use flex array destination for memcpy()Hangyu Hua
In preparation for FORTIFY_SOURCE performing run-time destination buffer bounds checking for memcpy(), specify the destination output buffer explicitly, instead of asking memcpy() to write past the end of what looked like a fixed-size object. Notice that srp_rsp[] is a pointer to a structure that contains flexible-array member data[]: struct srp_rsp { ... __be32 sense_data_len; __be32 resp_data_len; u8 data[]; }; link: https://github.com/KSPP/linux/issues/201 Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220909022943.8896-1-hbh25y@gmail.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20RDMA/srp: Fix srp_abort()Bart Van Assche
Fix the code for converting a SCSI command pointer into an SRP request pointer. Cc: Xiao Yang <yangx.jy@fujitsu.com> Fixes: ad215aaea4f9 ("RDMA/srp: Make struct scsi_cmnd and struct srp_request adjacent") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220908233139.3042628-1-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-06RDMA/rtrs-clt: Kill xchg_pathsGuoqing Jiang
Let's call try_cmpxchg directly for the same purpose. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220903040252.29397-1-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-06RDMA/rtrs-clt: Break the loop once one path is connectedGuoqing Jiang
No need to iterate all paths after find one connected path. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220902101922.26273-3-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-06RDMA/rtrs: Update comments for MAX_SESS_QUEUE_DEPTHGuoqing Jiang
The maximum queue_depth should be 65535 per check_module_params, also update other relevant comments. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220902101922.26273-2-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-01RDMA/srp: Set scmnd->result only when scmnd is not NULLyangx.jy@fujitsu.com
This change fixes the following kernel NULL pointer dereference which is reproduced by blktests srp/007 occasionally. BUG: kernel NULL pointer dereference, address: 0000000000000170 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 9 Comm: kworker/0:1H Kdump: loaded Not tainted 6.0.0-rc1+ #37 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014 Workqueue: 0x0 (kblockd) RIP: 0010:srp_recv_done+0x176/0x500 [ib_srp] Code: 00 4d 85 ff 0f 84 52 02 00 00 48 c7 82 80 02 00 00 00 00 00 00 4c 89 df 4c 89 14 24 e8 53 d3 4a f6 4c 8b 14 24 41 0f b6 42 13 <41> 89 87 70 01 00 00 41 0f b6 52 12 f6 c2 02 74 44 41 8b 42 1c b9 RSP: 0018:ffffaef7c0003e28 EFLAGS: 00000282 RAX: 0000000000000000 RBX: ffff9bc9486dea60 RCX: 0000000000000000 RDX: 0000000000000102 RSI: ffffffffb76bbd0e RDI: 00000000ffffffff RBP: ffff9bc980099a00 R08: 0000000000000001 R09: 0000000000000001 R10: ffff9bca53ef0000 R11: ffff9bc980099a10 R12: ffff9bc956e14000 R13: ffff9bc9836b9cb0 R14: ffff9bc9557b4480 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9bc97ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000170 CR3: 0000000007e04000 CR4: 00000000000006f0 Call Trace: <IRQ> __ib_process_cq+0xb7/0x280 [ib_core] ib_poll_handler+0x2b/0x130 [ib_core] irq_poll_softirq+0x93/0x150 __do_softirq+0xee/0x4b8 irq_exit_rcu+0xf7/0x130 sysvec_apic_timer_interrupt+0x8e/0xc0 </IRQ> Fixes: ad215aaea4f9 ("RDMA/srp: Make struct scsi_cmnd and struct srp_request adjacent") Link: https://lore.kernel.org/r/20220831081626.18712-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30IB/cm: Remove the service_mask parameter from ib_cm_listen()Mark Zhang
Remove the service_mask parameter of ib_cm_listen(), as all callers use 0. Link: https://lore.kernel.org/r/20220819090859.957943-2-markzhang@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30RDMA/rtrs: Remove 'dir' argument from rnbd_srv_rdma_evGuoqing Jiang
Since process_{read,write} already prints direction info if ctx->ops.rdma_ev fails, no need to pass 'dir'. Link: https://lore.kernel.org/r/20220826081117.21687-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-28RDMA/srp: Use the attribute group mechanism for sysfs attributesBart Van Assche
Simplify the SRP driver by using the attribute group mechanism instead of calling device_create_file() explicitly. Link: https://lore.kernel.org/r/20220825213900.864587-5-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-28RDMA/srp: Handle dev_set_name() failureBart Van Assche
Instead of ignoring dev_set_name() failure, handle dev_set_name() failure. Convert a device_register() call into device_initialize() and device_add() calls. Link: https://lore.kernel.org/r/20220825213900.864587-4-bvanassche@acm.org Reported-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-28RDMA/srp: Remove the srp_host.released completionBart Van Assche
Move the kfree(host) calls into srp_release_dev(). Convert a device_unregister() call into a device_del() and a device_put() call. Remove the host->released completion object. This patch prepares for handling dev_set_name() failure in srp_add_port(). Link: https://lore.kernel.org/r/20220825213900.864587-3-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-28RDMA/srp: Rework the srp_add_port() error pathBart Van Assche
device_register() always calls device_initialize() so calling device_del() is safe even if device_register() fails. Implement the following advice from the comment block above device_register(): "NOTE: _Never_ directly free @dev after calling this function, even if it returned an error! Always use put_device() to give up the reference initialized in this function instead." Keep the kfree() call in the error path since srp_release_dev() does not free the host. Link: https://lore.kernel.org/r/20220825213900.864587-2-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21IB: move from strlcpy with unused retval to strscpyWolfram Sang
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Link: https://lore.kernel.org/r/20220818210018.6841-1-wsa+renesas@sang-engineering.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21RDMA/rtrs-clt: Output sg index when warning onJack Wang
Output the sg index, so it's a bit easier for debug. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Link: https://lore.kernel.org/r/20220818105355.110344-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGLJack Wang
ib_dma_map_sg() augments the SGL into a 'dma mapped SGL'. This process may change the number of entries and the lengths of each entry. Code that touches dma_address is iterating over the 'dma mapped SGL' and must use dma_nents which returned from ib_dma_map_sg(). We should use the return count from ib_dma_map_sg for futher usage. Fixes: 9cb837480424e ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20220818105355.110344-4-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sgJack Wang
When iommu is enabled, we hit warnings like this: WARNING: at rtrs/rtrs.c:178 rtrs_iu_post_rdma_write_imm+0x9b/0x110 rtrs warn on one sge entry length is 0, which is unexpected. The problem is ib_dma_map_sg augments the SGL into a 'dma mapped SGL'. This process may change the number of entries and the lengths of each entry. Code that touches dma_address is iterating over the 'dma mapped SGL' and must use dma_nents which returned from ib_dma_map_sg(). So pass the count return from ib_dma_map_sg. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20220818105355.110344-3-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21RDMA/rtrs-srv: Add event tracing supportSantosh Pradhan
Add event tracing mechanism for following routines: - send_io_resp_imm() How to use: 1. Load the rtrs_server module 2. cd /sys/kernel/debug/tracing 3. If all the events need to be enabled: echo 1 > events/rtrs_srv/enable 4. OR only speific routine/event needs to be enabled e.g. echo 1 > events/rtrs_srv/send_io_resp_imm/enable 5. cat trace 6. Run some I/O workload which can trigger send_io_resp_imm() Link: https://lore.kernel.org/r/20220818105240.110234-3-haris.iqbal@ionos.com Signed-off-by: Santosh Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21RDMA/rtrs-clt: Add event tracing supportSantosh Pradhan
Add event tracing mechanism for following routines: - rtrs_clt_reconnect_work() - rtrs_clt_close_conns() - rtrs_rdma_error_recovery() How to use: 1. Load the rtrs_client module 2. cd /sys/kernel/debug/tracing 3. If all the events need to be enabled: echo 1 > events/rtrs_clt/enable 4. OR only speific routine/event needs to be enabled e.g. echo 1 > events/rtrs_clt/rtrs_clt_close_conns/enable 5. cat trace 6. Run some workload which can trigger rtrs_clt_close_conns() Link: https://lore.kernel.org/r/20220818105240.110234-2-haris.iqbal@ionos.com Signed-off-by: Santosh Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16IB/iser: Fix login with authenticationSergey Gorenko
The iSER Initiator uses two types of receive buffers: - one big login buffer posted by iser_post_recvl(); - several small message buffers posted by iser_post_recvm(). The login buffer is used at the login phase and full feature phase in the discovery session. It may take a few requests and responses to complete the login phase. The message buffers are only used in the normal operational session at the full feature phase. After the commit referred in the fixes line, the login operation fails if the authentication is enabled. That happens because the Initiator posts a small receive buffer after the first response from Target. So, the next send operation fails because Target's second response does not fit into the small receive buffer. This commit adds additional checks to prevent posting small receive buffers until the full feature phase. Fixes: 39b169ea0d36 ("IB/iser: Fix RNR errors") Link: https://lore.kernel.org/r/20220805060135.18493-1-sergeygo@nvidia.com Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This cycle we got a new RDMA driver "ERDMA" for the Alibaba cloud environment. Otherwise the changes are dominated by rxe fixes. There is another RDMA driver on the list that might get merged next cycle, 'MANA' for the Azure cloud environment. Summary: - Bug fixes and small features for irdma, hns, siw, qedr, hfi1, mlx5 - General spelling/grammer fixes - rdma cm can follow changes in neighbours for control packets - Significant amounts of rxe fixes and spec compliance changes - Use the modern NAPI API - Use the bitmap API instead of open coding - Performance improvements for rtrs - Add the ERDMA driver for Alibaba cloud - Fix a use after free bug in SRP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA/ib_srpt: Unify checking rdma_cm_id condition in srpt_cm_req_recv() RDMA/rxe: Fix error unwind in rxe_create_qp() RDMA/mlx5: Add missing check for return value in get namespace flow RDMA/rxe: Split qp state for requester and completer RDMA/rxe: Generate error completion for error requester QP state RDMA/rxe: Update wqe_index for each wqe error completion RDMA/srpt: Fix a use-after-free RDMA/srpt: Introduce a reference count in struct srpt_device RDMA/srpt: Duplicate port name members IB/qib: Fix repeated "in" within comments RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions ...
2022-08-04Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, qla2xx, target, lpfc, smartpqi, mpi3mr). The main driver change that might cause issues on down the road is the conversion of some of our oldest surviving drivers to the DMA API (should only affect m68k). The only major core change is the rework of async resume; the rest are either completely trivial or for updating deprecated APIs" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (195 commits) scsi: target: Remove XDWRITEREAD emulated support scsi: megaraid: Remove the static variable initialisation scsi: ch: Do not initialise statics to 0 scsi: ufs: core: Fix spelling mistake "Cannnot" -> "Cannot" scsi: target: iscsi: Do not require target authentication scsi: target: iscsi: Allow AuthMethod=None scsi: target: iscsi: Support base64 in CHAP scsi: target: iscsi: Add support for extended CDB AHS scsi: ufs: dt-bindings: Add SC8280XP binding scsi: target: iscsi: Fix clang -Wformat warnings scsi: ufs: core: Read device property for ref clock scsi: libsas: Resume SAS host for phy reset or enable via sysfs scsi: hisi_sas: Modify v3 HW SATA completion error processing scsi: hisi_sas: Relocate DMA unmap of SMP task scsi: hisi_sas: Remove unnecessary variable to hold DMA map elements scsi: hisi_sas: Call hisi_sas_slave_configure() from slave_configure_v3_hw() scsi: mpi3mr: Delete a stray tab scsi: mpi3mr: Unlock on error path scsi: mpi3mr: Reduce VD queue depth on detecting throttling scsi: mpi3mr: Resource Based Metering ...
2022-08-03Merge tag 'net-next-6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Paolo Abeni: "Core: - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one - Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router. - Network-side support for IO uring zero-copy send. - A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic. - Rename reference tracking helpers to a more consistent netdev_* schema. - Adapt u64_stats_t type to address load/store tearing issues. - Refine debug helper usage to reduce the log noise caused by bots. BPF: - Improve socket map performance, avoiding skb cloning on read operation. - Add support for 64 bits enum, to match types exposed by kernel. - Introduce support for sleepable uprobes program. - Introduce support for enum textual representation in libbpf. - New helpers to implement synproxy with eBPF/XDP. - Improve loop performances, inlining indirect calls when possible. - Removed all the deprecated libbpf APIs. - Implement new eBPF-based LSM flavor. - Add type match support, which allow accurate queries to the eBPF used types. - A few TCP congetsion control framework usability improvements. - Add new infrastructure to manipulate CT entries via eBPF programs. - Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function. Protocols: - Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention. - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support. - Add support to forciby close TIME_WAIT TCP sockets via user-space tools. - Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy. - Support for changing the initial MTPCP subflow priority/backup status - Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure. - Extend CAN ethtool support with timestamping capabilities - Refactor CAN build infrastructure to allow building only the needed features. Driver API: - Remove devlink mutex to allow parallel commands on multiple links. - Add support for pause stats in distributed switch. - Implement devlink helpers to query and flash line cards. - New helper for phy mode to register conversion. New hardware / drivers: - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro. - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch. - Ethernet DSA driver for the Microchip LAN937x switch. - Ethernet PHY driver for the Aquantia AQR113C EPHY. - CAN driver for the OBD-II ELM327 interface. - CAN driver for RZ/N1 SJA1000 CAN controller. - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device. Drivers: - Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload - Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate - Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default - Microsoft vNIC driver (mana) - add support for XDP redirect - Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support - Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics - Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY - Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload - Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration - Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support Old code removal: - Neterion vxge ethernet driver: this is untouched since more than 10 years" * tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits) doc: sfp-phylink: Fix a broken reference wireguard: selftests: support UML wireguard: allowedips: don't corrupt stack when detecting overflow wireguard: selftests: update config fragments wireguard: ratelimiter: use hrtimer in selftest net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ net: usb: ax88179_178a: Bind only to vendor-specific interface selftests: net: fix IOAM test skip return code net: usb: make USB_RTL8153_ECM non user configurable net: marvell: prestera: remove reduntant code octeontx2-pf: Reduce minimum mtu size to 60 net: devlink: Fix missing mutex_unlock() call net/tls: Remove redundant workqueue flush before destroy net: txgbe: Fix an error handling path in txgbe_probe() net: dsa: Fix spelling mistakes and cleanup code Documentation: devlink: add add devlink-selftests to the table of contents dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock net: ionic: fix error check for vlan flags in ionic_set_nic_features() net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr() nfp: flower: add support for tunnel offload without key ID ...