aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-23scsi: mpi3mr: Detect async reset that occurred in firmwareSreekanth Reddy
Detect asynchronous reset that occurred in the firmware by polling for reset history bit of IOC status register is set and if that bit is set, then the driver waits for the controller to become ready and then re-initializes the controller. Also reduce the time driver is waiting for the controller to acknowledge the reset action after issuing a specific reset action to the controller. The wait time is reduced from 510 seconds to 30 seconds. If the controller didn't acknowledge a specific reset action within the time interval then the driver marks the controller as unrecoverable instead of retrying two more times prior to giving up. Link: https://lore.kernel.org/r/20211220141159.16117-17-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Add IOC reinit functionSreekanth Reddy
Add IOC reinitialization function. Link: https://lore.kernel.org/r/20211220141159.16117-16-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Handle offline FW activation in graceful mannerSreekanth Reddy
Currently the driver marks the controller as unrecoverable if there is an asynchronous reset or fault during the initialization, reinitialization post reset, and OS resume. Enhance driver to retry the initialization, re-initialization, and resume sequences for a maximum of 3 times if the controller became faulty or asynchronously reset due to a firmware activation during the initialization sequence. Link: https://lore.kernel.org/r/20211220141159.16117-15-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Code refactor of IOC init - part2Sreekanth Reddy
Move the IOC initialization's bring up logic to mpi3mr_bring_ioc_ready() routine. Link: https://lore.kernel.org/r/20211220141159.16117-14-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Code refactor of IOC init - part1Sreekanth Reddy
Separate out reply and sense buffer allocation and initialization into two routines and call only initialization routine while issuing the IOC Init request message. Also move out the event enable logic to a separate function. Link: https://lore.kernel.org/r/20211220141159.16117-13-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Fault IOC when internal command gets timeoutSreekanth Reddy
Save snapdump and fault the controller with the given reason code if it is already not in the fault or not in asynchronous reset. This ensures that soft reset is issued from the watchdog thread. This will also be used to handle initialization time faults/resets/timeout as in those cases immediate soft reset invocation is not required. Link: https://lore.kernel.org/r/20211220141159.16117-12-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Display IOC firmware package versionSreekanth Reddy
Display IOC firmware package version by reading component image upload data. Link: https://lore.kernel.org/r/20211220141159.16117-11-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Handle unaligned PLL in unmap cmndsSreekanth Reddy
The following special handling is needed for UNMAP commands issued to NVMe drives: - On B0 boards, if the parameter list length is greater than 24 and not a 16-byte multiple, then truncate the parameter list length to a 16-byte multiple. - On A0 boards, if the parameter list length is greater than block descriptor data length + 8, then truncate the parameter list length to block descriptor data length + 8 value. Link: https://lore.kernel.org/r/20211220141159.16117-10-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Increase internal cmnds timeout to 60sSreekanth Reddy
- Increase internal command timeout to 60 seconds. - Enable 16 device removal handshake processing in parallel in the device removal handshake infrastructure. Link: https://lore.kernel.org/r/20211220141159.16117-9-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Do access status validation before adding devicesSreekanth Reddy
Add validation for various access statuses prior to exposing attached target device to the operating system. Link: https://lore.kernel.org/r/20211220141159.16117-8-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Add support for PCIe Managed Switch SES deviceSreekanth Reddy
The SAS4 Controller firmware exposes the SES devices in Managed PCIe Switch as a PCIe Device Type SCSI Device (MPI3_DEVICE0_PCIE_DEVICE_INFO_TYPE_SCSI_DEVICE). Driver is enhanced to handle this device type by: - Exposing the device to the upper layers and - Not updating any hardware sectors & virtual boundary settings as these settings are needed only for NVMe devices. Link: https://lore.kernel.org/r/20211220141159.16117-7-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Update MPI3 headers - part2Sreekanth Reddy
Continued updating MPI3 headers. Link: https://lore.kernel.org/r/20211220141159.16117-6-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Update MPI3 headers - part1Sreekanth Reddy
Update MPI3 headers. Link: https://lore.kernel.org/r/20211220141159.16117-5-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Don't reset IOC if cmnds flush with reset statusSreekanth Reddy
Don't issue the soft reset if internal commands are flushed out with reset status. Soft reset needs to be issued only if commands are really timed out. Link: https://lore.kernel.org/r/20211220141159.16117-4-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Replace spin_lock() with spin_lock_irqsave()Sreekanth Reddy
Use spin_lock_irqsave() instead of spin_lock() while acquiring reply_free_queue_lock & sbq_lock locks. Link: https://lore.kernel.org/r/20211220141159.16117-3-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-23scsi: mpi3mr: Add debug APIs based on logging_level bitsSreekanth Reddy
Add debug print functions which will print messages based on logging_level bits enabled. Link: https://lore.kernel.org/r/20211220141159.16117-2-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: pmcraid: Don't use GFP_DMA in pmcraid_alloc_sglist()Christoph Hellwig
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222092247.928711-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: snic: Don't use GFP_DMA in snic_queue_report_tgt_req()Christoph Hellwig
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222092048.925829-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: myrs: Don't use GFP_DMAChristoph Hellwig
The myrs devices supports 64-bit addressing, so remove the spurious GFP_DMA allocations. Link: https://lore.kernel.org/r/20211222091935.925624-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: myrb: Don't use GFP_DMA in myrb_pdev_slave_alloc()Christoph Hellwig
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222091801.924745-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: initio: Don't use GFP_DMA in initio_probe_one()Christoph Hellwig
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222091630.922788-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: sr: Don't use GFP_DMAChristoph Hellwig
The allocated buffers are used as a command payload, for which the block layer and/or DMA API do the proper bounce buffering if needed. Link: https://lore.kernel.org/r/20211222090842.920724-1-hch@lst.de Reported-by: Baoquan He <bhe@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: ch: Don't use GFP_DMAChristoph Hellwig
The allocated buffers are used as a command payload, for which the block layer and/or DMA API do the proper bounce buffering if needed. Link: https://lore.kernel.org/r/20211222090311.916624-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Use autosuspend for the host controllerXiang Chen
The controller may frequently enter and exit suspend for each I/O which we need to deal with. This is inefficient and may cause too much suspend and resume activity for the controller. To avoid this, use a default 5s autosuspend for the controller to stop frequently suspending and resuming. This value may still be modified via sysfs interfaces. Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Keep host active while processing eventsXiang Chen
Processing events such as PORTE_BROADCAST_RCVD may cause dependency issues for runtime power management support. Such a problem would be that handling a PORTE_BROADCAST_RCVD event requires that the host is resumed to send SMP commands. However, in resuming the host, the phyup events generated from re-enabling the phys are processed in the same workqueue as the original PORTE_BROADCAST_RCVD event. As such, the host will never finish resuming (as it waits for the phyup event processing), and then the PORTE_BROADCAST_RCVD event can't be processed as the SMP commands are blocked, and so we have a deadlock. Solve this problem by ensuring that libsas keeps the host active until completely finished phy or port events, such as PORTE_BYTES_DMAED. As such, we don't have to worry about resuming the host for processing individual SMP commands in this example. Link: https://lore.kernel.org/r/1639999298-244569-15-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Keep controller active between ISR of phyup and the event ↵Xiang Chen
being processed It is possible that controller may become suspended between processing a phyup interrupt and the event being processed by libsas. As such, we can't ensure the controller is active when processing the phyup event - this may cause the phyup event to be lost or other issues. To avoid any possible issues, add pm_runtime_get_noresume() in phyup interrupt handler and pm_runtime_put_sync() in the work handler exit to ensure that we stay always active. Since we only want to call pm_runtime_get_noresume() for v3 hw, signal this will a new event, HISI_PHYE_PHY_UP_PM. Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Defer works of new phys during suspendXiang Chen
During the processing of event PORT_BYTES_DMAED, the driver queues work DISCE_DISCOVER_DOMAIN and then flushes workqueue ha->disco_q. If a new phyup event occurs during resuming the controller, the work PORTE_BYTES_DMAED of new phy occurs before suspended phy's. The work DISCE_DISCOVER_DOMAIN of new phy requires an active SAS controller (it needs to resume SAS controller by function scsi_sysfs_add_sdev() and some other functions such as function add_device_link()). However, the activation of the SAS controller requires completion of work PORTE_BYTES_DMAED of suspended phys while it is blocked by new phy's work on ha->event_q. So there is a deadlock and it is released only after resume timeout. To solve the issue, defer works of new phys during suspend and queue those defer works after SAS controller becomes active. Link: https://lore.kernel.org/r/1639999298-244569-13-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Refactor sas_queue_deferred_work()Xiang Chen
In the second part of function __sas_drain_work(), deferred work is queued. This functionality is required other places so factor it out into the function sas_queue_deferred_work(). Link: https://lore.kernel.org/r/1639999298-244569-12-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Add flag SAS_HA_RESUMINGXiang Chen
Add a flag SAS_HA_RESUMING and use it to indicate the state of resuming the host controller. Link: https://lore.kernel.org/r/1639999298-244569-11-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Resume host while sending SMP I/OsXiang Chen
When sending SMP I/Os to the host we need to ensure that the host is not suspended and can process the commands. This is a better approach than replying on the host to resume itself to handle such commands. Use pm_runtime_get_sync() and pm_runtime_put_sync() calls for the host when executing SMP I/Os. Link: https://lore.kernel.org/r/1639999298-244569-10-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Add more logs for runtime suspend/resumeXiang Chen
Add some logs at the beginning and end of suspend/resume. Link: https://lore.kernel.org/r/1639999298-244569-9-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Insert PORTE_BROADCAST_RCVD event for resuming hostXiang Chen
If a new disk is inserted through an expander when the host was suspended, it will not necessarily be detected as the topology is not re-scanned during resume. To detect possible changes in topology during suspension, insert a PORTE_BROADCAST_RCVD event per port when resuming to trigger a revalidation. Link: https://lore.kernel.org/r/1639999298-244569-8-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: mvsas: Add spin_lock/unlock() to protect asd_sas_port->phy_listXiang Chen
phy_list_lock is not held when using asd_sas_port->phy_list in the mvsas driver. Add spin_lock/unlock in those places. Link: https://lore.kernel.org/r/1639999298-244569-7-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_listXiang Chen
Most places that use asd_sas_port->phy_list are protected by spinlock asd_sas_port->phy_list_lock, however there are still some places which miss grabbing the lock. Add it in function hisi_sas_refresh_port_id() when accessing asd_sas_port->phy_list. This carries a risk that list mutates while at the same time dropping the lock in function hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list to avoid this risk. Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Add spin_lock/unlock() to protect asd_sas_port->phy_listXiang Chen
Most places that use asd_sas_port->phy_list in libsas are protected by spinlock asd_sas_port->phy_list_lock. However, there are still a few places which miss the lock. Add it in those places. Link: https://lore.kernel.org/r/1639999298-244569-5-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: block: pm: Always set request queue runtime active in ↵Alan Stern
blk_post_runtime_resume() John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend"John Garry
This reverts commit b14a37e011d829404c29a5ae17849d7efb034893. In that commit, we had to filter out phy-up events during suspend, as it work cause a deadlock between processing the phyup event and the resume HA function try to drain the HA event workqueue to complete the resume process. Now that we no longer try to drain the HA event queue during the HA resume processor, the deadlock would not occur, so remove the special handling for it. Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Don't always drain event workqueue for HA resumeJohn Garry
For the hisi_sas driver, if a directly attached disk is removed during suspend, a hang will occur in the resume process: The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device link between SCSI devices and hisi_hba"), it is ensured that the HBA device cannot be runtime suspended when any SCSI device associated is active. Other drivers which use libsas don't worry about this as none support runtime suspend. The mentioned hang occurs when an disk is removed during suspend. In the removal process - from PHYE_RESUME_TIMEOUT event processing - we call into scsi_remove_device(), which is being processed in the HA event workqueue. Here we wait for all suppliers of the SCSI device to resume, which includes the HBA device (from the above commit). However the HBA device cannot resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from calling sas_resume_ha() -> sas_drain_work()). This is the deadlock. There does not appear to be any need for the sas_drain_work() to be called at all in sas_resume_ha() as it is not syncing against anything, so allow LLDDs to avoid this by providing a variant of sas_resume_ha() which does "sync", i.e. doesn't drain the event workqueue. Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: libsas: Decode SAM status and host byte codesJohn Garry
Value 0 is used for SAM status and libsas exec_status bytes codes in sas_end_task() - use defined macros instead. In addition, change to proper enum types. Also replace SAM_STAT_CHECK_CONDITION with SAS_SAM_STAT_CHECK_CONDITION, the former being a proper member of enum exec_status. Link: https://lore.kernel.org/r/1639579061-179473-9-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Fix phyup timeout on FPGAQi Liu
The OOB interrupt and phyup interrupt handlers may run out-of-order in high CPU usage scenarios. Since the hisi_sas_phy.timer is added in hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order execution will cause hisi_sas_phy.timer timeout to trigger. To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure that the timer won't be added after phyup handler completes. Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Prevent parallel FLR and controller resetQi Liu
If we issue a controller reset command during executing a FLR a hung task may be found: Call trace: __switch_to+0x158/0x1cc __schedule+0x2e8/0x85c schedule+0x7c/0x110 schedule_timeout+0x190/0x1cc __down+0x7c/0xd4 down+0x5c/0x7c hisi_sas_task_exec+0x510/0x680 [hisi_sas_main] hisi_sas_queue_command+0x24/0x30 [hisi_sas_main] smp_execute_task_sg+0xf4/0x23c [libsas] sas_smp_phy_control+0x110/0x1e0 [libsas] transport_sas_phy_reset+0xc8/0x190 [libsas] phy_reset_work+0x2c/0x40 [libsas] process_one_work+0x1dc/0x48c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x18 This is a race condition which occurs when the FLR completes first. Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem . Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Prevent parallel controller reset and control phy commandQi Liu
A user may issue a control phy command from sysfs at any time, even if the controller is resetting. If a phy is disabled by hardreset/linkreset command before calling get_phys_state() in the reset path, the saved phy state may be incorrect. To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure that the controller reset may not run at the same time as when the phy control function is running. Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Factor out task prep and delivery codeJohn Garry
The task prep code is the same between the normal path (in hisi_sas_task_prep()) and the internal abort path, so factor is out into a common function. Link: https://lore.kernel.org/r/1639579061-179473-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Pass abort structure for internal abortJohn Garry
To help factor out code in future, it's useful to know if we're executing an internal abort, so pass a pointer to the structure. The idea is that a NULL pointer means not an internal abort. Link: https://lore.kernel.org/r/1639579061-179473-4-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Make internal abort have no task protoJohn Garry
For an internal abort, the task does not have a protocol, so set to none. This will make it easier to differentiate internal abort tasks in future. Link: https://lore.kernel.org/r/1639579061-179473-3-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Start delivery hisi_sas_task_exec() directlyJohn Garry
Currently we start delivery of commands to the DQ after returning from hisi_sas_task_exec() with success. Let's just start delivery directly in that function without having to check if some local variable is set. Link: https://lore.kernel.org/r/1639579061-179473-2-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: efct: Don't pass GFP_DMA to dma_alloc_coherent()Christoph Hellwig
dma_alloc_coherent() ignores the zone specifiers so this is pointless and confusing. Link: https://lore.kernel.org/r/20211214163605.416288-1-hch@lst.de Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: ufs: core: Fix deadlock issue in ufshcd_wait_for_doorbell_clr()Bean Huo
Call shost_for_each_device() with holding host->host_lock will cause a deadlock situation, which will cause the system to stall (the log as follow). Fix this issue by using __shost_for_each_device() in ufshcd_pending_cmds(). stalls on CPUs/tasks: all trace: __switch_to+0x120/0x170 0xffff800011643998 ask dump for CPU 5: ask:kworker/u16:2 state:R running task stack: 0 pid: 80 ppid: 2 flags:0x0000000a orkqueue: events_unbound async_run_entry_fn all trace: __switch_to+0x120/0x170 0x0 ask dump for CPU 6: ask:kworker/u16:6 state:R running task stack: 0 pid: 164 ppid: 2 flags:0x0000000a orkqueue: events_unbound async_run_entry_fn all trace: __switch_to+0x120/0x170 0xffff54e7c4429f80 ask dump for CPU 7: ask:kworker/u16:4 state:R running task stack: 0 pid: 153 ppid: 2 flags:0x0000000a orkqueue: events_unbound async_run_entry_fn all trace: __switch_to+0x120/0x170 blk_mq_run_hw_queue+0x34/0x110 blk_mq_sched_insert_request+0xb0/0x120 blk_execute_rq_nowait+0x68/0x88 blk_execute_rq+0x4c/0xd8 __scsi_execute+0xec/0x1d0 scsi_vpd_inquiry+0x84/0xf0 scsi_get_vpd_buf+0x34/0xb8 scsi_attach_vpd+0x34/0x140 scsi_probe_and_add_lun+0xa6c/0xab8 __scsi_scan_target+0x438/0x4f8 scsi_scan_channel+0x6c/0xa8 scsi_scan_host_selected+0xf0/0x150 do_scsi_scan_host+0x88/0x90 scsi_scan_host+0x1b4/0x1d0 ufshcd_async_scan+0x248/0x310 async_run_entry_fn+0x30/0x178 process_one_work+0x1e8/0x368 worker_thread+0x40/0x478 kthread+0x174/0x180 ret_from_fork+0x10/0x20 Link: https://lore.kernel.org/r/20211214120537.531628-1-huobean@gmail.com Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") Reported-by: YongQin Liu <yongqin.liu@linaro.org> Reported-by: Amit Pundir <amit.pundir@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: qla2xxx: Synchronize rport dev_loss_tmo settingHannes Reinecke
Currently, the dev_loss_tmo setting is only ever used for SCSI devices. This patch reshuffles initialisation such that the SCSI remote ports are registered before the NVMe ones, allowing the dev_loss_tmo setting to be synchronized between SCSI and NVMe. Link: https://lore.kernel.org/r/20211214111139.52503-1-dwagner@suse.de Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16Merge branch '5.16/scsi-fixes' into 5.17/scsi-stagingMartin K. Petersen
Pull in the 5.16 fixes branch to resolve a conflict in the UFS driver core. Conflicts: drivers/scsi/ufs/ufshcd.c Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>