From 35ed9613d83f3c1f011877d591fd7d36f2666106 Mon Sep 17 00:00:00 2001 From: James Smart Date: Wed, 16 Mar 2022 20:27:34 -0700 Subject: scsi: lpfc: Improve PCI EEH Error and Recovery Handling Following EEH errors, the driver can crash or hang when deleting the localport or when attempting to unload. The EEH handlers in the driver did not notify the NVMe-FC transport before tearing the driver down. This was delayed until the resume steps. This worked for SCSI because lpfc_block_scsi() would notify the scsi_fc_transport that the target was not available but it would not clean up all the references to the ndlp. The SLI3 prep for dev reset handler did the lpfc_offline_prep() and lpfc_offline() calls to get the port stopped before restarting. The SLI4 version of the prep for dev reset just destroyed the queues and did not stop NVMe from continuing. Also because the port was not really stopped the localport destroy would hang because the transport was still waiting for I/O. Additionally, a devloss tmo can fire and post events to a stopped worker thread creating another hang condition. lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI and NVMe transports are notified to block I/O to the driver. Logic is added to devloss handler and worker thread to clean up ndlp references and quiesce appropriately. Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.com Co-developed-by: Justin Tee Signed-off-by: Justin Tee Signed-off-by: James Smart Signed-off-by: Martin K. Petersen --- drivers/scsi/lpfc/lpfc.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'drivers/scsi/lpfc/lpfc.h') diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 86653aa9b389..8405fd0bbc59 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -896,6 +896,11 @@ enum lpfc_irq_chann_mode { NHT_MODE, }; +enum lpfc_hba_bit_flags { + FABRIC_COMANDS_BLOCKED, + HBA_PCI_ERR, +}; + struct lpfc_hba { /* SCSI interface function jump table entries */ struct lpfc_io_buf * (*lpfc_get_scsi_buf) @@ -1042,7 +1047,6 @@ struct lpfc_hba { * Firmware supports Forced Link Speed * capability */ -#define HBA_PCI_ERR 0x80000 /* The PCI slot is offline */ #define HBA_FLOGI_ISSUED 0x100000 /* FLOGI was issued */ #define HBA_SHORT_CMF 0x200000 /* shorter CMF timer routine */ #define HBA_CGN_DAY_WRAP 0x400000 /* HBA Congestion info day wraps */ @@ -1349,7 +1353,6 @@ struct lpfc_hba { atomic_t fabric_iocb_count; struct timer_list fabric_block_timer; unsigned long bit_flags; -#define FABRIC_COMANDS_BLOCKED 0 atomic_t num_rsrc_err; atomic_t num_cmd_success; unsigned long last_rsrc_error_time; -- cgit v1.2.3