aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2021-08-10powerpc: use IRQF_NO_DEBUG for IPIsCédric Le Goater
There is no need to use the lockup detector ("noirqdebug") for IPIs. The ipistorm benchmark measures a ~10% improvement on high systems when this flag is set. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210719130614.195886-1-clg@kaod.org
2021-08-10powerpc/xive: Use XIVE domain under xmon and debugfsCédric Le Goater
The default domain of the PCI/MSIs is not the XIVE domain anymore. To list the IRQ mappings under XMON and debugfs, query the IRQ data from the low level XIVE domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-32-clg@kaod.org
2021-08-10KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interruptsCédric Le Goater
PCI MSIs now live in an MSI domain but the underlying calls, which will EOI the interrupt in real mode, need an HW IRQ number mapped in the XICS IRQ domain. Grab it there. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-31-clg@kaod.org
2021-08-10powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()Cédric Le Goater
pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts when in real mode. Adding MSI domain broke the hack using the 'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the IRQ chip data in the MSI domain. The 'ioda.irq_chip' field is now unused and could be removed from the pnv_phb struct. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-30-clg@kaod.org
2021-08-10powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devicesCédric Le Goater
Before MSI domains, the default IRQ chip of PHB3 MSIs was patched by pnv_set_msi_irq_chip() with the custom EOI handler pnv_ioda2_msi_eoi() and the owning PHB was deduced from the 'ioda.irq_chip' field. This path has been deprecated by the MSI domains but it is still in use by the P8 CAPI 'cxl' driver. Rewriting this driver to support MSI would be a waste of time. Nevertheless, we can still remove the IRQ chip patch and set the IRQ chip data instead. This is cleaner. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-29-clg@kaod.org
2021-08-10powerpc/xics: Fix IRQ migrationCédric Le Goater
desc->irq_data points to the top level IRQ data descriptor which is not necessarily in the XICS IRQ domain. MSIs are in another domain for instance. Fix that by looking for a mapping on the low level XICS IRQ domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-28-clg@kaod.org
2021-08-10powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interruptCédric Le Goater
The pnv_ioda2_msi_eoi() chip handler is not used anymore for MSIs. Simply use the check on the PSI-MSI chip. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-27-clg@kaod.org
2021-08-10powerpc/powernv/pci: Drop unused MSI codeCédric Le Goater
MSIs should be fully managed by the PCI and IRQ subsystems now. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-26-clg@kaod.org
2021-08-10powerpc/pseries/pci: Drop unused MSI codeCédric Le Goater
MSIs should be fully managed by the PCI and IRQ subsystems now. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-25-clg@kaod.org
2021-08-10powerpc/xics: Drop unmask of MSIs at startupCédric Le Goater
That was a workaround in the XICS domain because of the lack of MSI domain. This is now handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-24-clg@kaod.org
2021-08-10powerpc/pci: Drop XIVE restriction on MSI domainsCédric Le Goater
The PowerNV and pSeries platforms now have support for both the XICS and XIVE IRQ domains. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-23-clg@kaod.org
2021-08-10powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3Cédric Le Goater
PHB3s need an extra OPAL call to EOI the interrupt. The call takes an OPAL HW IRQ number but it is translated into a vector number in OPAL. Here, we directly use the vector number of the in-the-middle "PNV-MSI" domain instead of grabbing the OPAL HW IRQ number in the XICS parent domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-22-clg@kaod.org
2021-08-10powerpc/xics: Add support for IRQ domain hierarchyCédric Le Goater
XICS doesn't have any state associated with the IRQ. The support is straightforward and simpler than for XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-21-clg@kaod.org
2021-08-10powerpc/xics: Add debug logging to the set_irq_affinity handlersCédric Le Goater
It really helps to know how the HW is configured when tweaking the IRQ subsystem. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-20-clg@kaod.org
2021-08-10powerpc/xics: Give a name to the default XICS IRQ domainCédric Le Goater
and clean up the error path. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-19-clg@kaod.org
2021-08-10powerpc/xics: Rename the map handler in a check handlerCédric Le Goater
This moves the IRQ initialization done under the different ICS backends in the common part of XICS. The 'map' handler becomes a simple 'check' on the HW IRQ at the FW level. As we don't need an ICS anymore in xics_migrate_irqs_away(), the XICS domain does not set a chip data for the IRQ. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-18-clg@kaod.org
2021-08-10powerpc/xics: Remove ICS listCédric Le Goater
We always had only one ICS per machine. Simplify the XICS driver by removing the ICS list. The ICS stored in the chip data of the XICS domain becomes useless and we don't need it anymore to migrate away IRQs from a CPU. This will be removed in a subsequent patch. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-17-clg@kaod.org
2021-08-10KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interruptsCédric Le Goater
PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the underlying calls handling the passthrough of the interrupt in the guest need a number in the XIVE IRQ domain. Use the IRQ data mapped in the XIVE IRQ domain and not the one in the PCI-MSI domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-16-clg@kaod.org
2021-08-10KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routinesCédric Le Goater
The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the host IRQ number to remove a useless conversion. Add some debug. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-15-clg@kaod.org
2021-08-10KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interruptsCédric Le Goater
Passthrough PCI MSI interrupts are detected in KVM with a check on a specific EOI handler (P8) or on XIVE (P9). We can now check the PCI-MSI IRQ chip which is cleaner. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-14-clg@kaod.org
2021-08-10powerpc/powernv/pci: Add MSI domainsCédric Le Goater
This is very similar to the MSI domains of the pSeries platform. The MSI allocator is directly handled under the Linux PHB in the in-the-middle "PNV-MSI" domain. Only the XIVE (P9/P10) parent domain is supported for now. Support for XICS will come later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-13-clg@kaod.org
2021-08-10powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup()Cédric Le Goater
It will be used as a 'compose_msg' handler of the MSI domain introduced later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-12-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add support of MSI domains to PHB hotplugCédric Le Goater
Simply allocate or release the MSI domains when a PHB is inserted in or removed from the machine. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-11-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add a msi_free() handler to clear XIVE dataCédric Le Goater
The MSI domain clears the IRQ with msi_domain_free(), which calls irq_domain_free_irqs_top(), which clears the handler data. This is a problem for the XIVE controller since we need to unmap MMIO pages and free a specific XIVE structure. The 'msi_free()' handler is called before irq_domain_free_irqs_top() when the handler data is still available. Use that to clear the XIVE controller data. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-10-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add a domain_free_irqs() handlerCédric Le Goater
The RTAS firmware can not disable one MSI at a time. It's all or nothing. We need a custom free IRQ handler for that. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-9-clg@kaod.org
2021-08-10powerpc/xive: Remove irqd_is_started() check when setting the affinityCédric Le Goater
In the early days of XIVE support, commit cffb717ceb8e ("powerpc/xive: Ensure active irqd when setting affinity") tried to fix an issue related to interrupt migration. If the root cause was related to CPU unplug, it should have been fixed and there is no reason to keep the irqd_is_started() check. This test is also breaking affinity setting of MSIs which can set before starting the associated IRQ. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-8-clg@kaod.org
2021-08-10powerpc/xive: Drop unmask of MSIs at startupCédric Le Goater
That was a workaround in the XIVE domain because of the lack of MSI domain. This is now handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-7-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add MSI domainsCédric Le Goater
Two IRQ domains are added on top of default machine IRQ domain. First, the top level "pSeries-PCI-MSI" domain deals with the MSI specificities. In this domain, the HW IRQ numbers are generated by the PCI MSI layer, they compose a unique ID for an MSI source with the PCI device identifier and the MSI vector number. These numbers can be quite large on a pSeries machine running under the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will require small fixes to show them correctly. Second domain is the in-the-middle "pSeries-MSI" domain which acts as a proxy between the PCI MSI subsystem and the machine IRQ subsystem. It usually allocate the MSI vector numbers but, on pSeries machines, this is done by the RTAS FW and RTAS returns IRQ numbers in the IRQ number space of the machine. This is why the in-the-middle "pSeries-MSI" domain has the same HW IRQ numbers as its parent domain. Only the XIVE (P9/P10) parent domain is supported for now. We still need to add support for IRQ domain hierarchy under XICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-6-clg@kaod.org
2021-08-10powerpc/xive: Ease debugging of xive_irq_set_affinity()Cédric Le Goater
pr_debug() is easier to activate and it helps to know how the kernel configures the HW when tweaking the IRQ subsystem. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-5-clg@kaod.org
2021-08-10powerpc/xive: Add support for IRQ domain hierarchyCédric Le Goater
This adds handlers to allocate/free IRQs in a domain hierarchy. We could try to use xive_irq_domain_map() in xive_irq_domain_alloc() but we rely on xive_irq_alloc_data() to set the IRQ handler data and duplicating the code is simpler. xive_irq_free_data() needs to be called when IRQ are freed to clear the MMIO mappings and free the XIVE handler data, xive_irq_data structure. This is going to be a problem with MSI domains which we will address later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-4-clg@kaod.org
2021-08-10powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs()Cédric Le Goater
This splits the routine setting the MSIs in two parts: allocation of MSIs for the PCI device at the FW level (RTAS) and the actual mapping and activation of the IRQs. rtas_prepare_msi_irqs() will serve as a handler for the PCI MSI domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-3-clg@kaod.org
2021-08-10powerpc/pseries/pci: Introduce __find_pe_total_msi()Cédric Le Goater
It will help to size the PCI MSI domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-2-clg@kaod.org
2021-08-10KVM: PPC: Use arch_get_random_seed_long instead of powernv variantAlexey Kardashevskiy
The powernv_get_random_long() does not work in nested KVM (which is pseries) and produces a crash when accessing in_be64(rng->regs) in powernv_get_random_long(). This replaces powernv_get_random_long with the ppc_md machine hook wrapper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805075649.2086567-1-aik@ozlabs.ru
2021-08-10powerpc/configs: Disable legacy ptys on microwatt defconfigAnton Blanchard
We shouldn't need legacy ptys, and disabling the option improves boot time by about 0.5 seconds. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805112005.3cb1f412@kryten.localdomain
2021-08-10powerpc: Always inline radix_enabled() to fix build failureJordan Niethe
This is the same as commit acdad8fb4a15 ("powerpc: Force inlining of mmu_has_feature to fix build failure") but for radix_enabled(). The config in the linked bugzilla causes the following build failure: LD .tmp_vmlinux.kallsyms1 powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags': pgtable.c:(.text+0x17c): undefined reference to `.radix__ptep_set_access_flags' powerpc64-linux-ld: arch/powerpc/mm/pageattr.o: in function `.change_page_attr': pageattr.c:(.text+0xc0): undefined reference to `.radix__flush_tlb_kernel_range' etc. This is due to radix_enabled() not being inlined. See extract from building with -Winline: In file included from arch/powerpc/include/asm/lppaca.h:46, from arch/powerpc/include/asm/paca.h:17, from arch/powerpc/include/asm/current.h:13, from include/linux/thread_info.h:23, from include/asm-generic/preempt.h:5, from ./arch/powerpc/include/generated/asm/preempt.h:1, from include/linux/preempt.h:78, from include/linux/spinlock.h:51, from include/linux/mmzone.h:8, from include/linux/gfp.h:6, from arch/powerpc/mm/pgtable.c:21: arch/powerpc/include/asm/book3s/64/pgtable.h: In function '__ptep_set_access_flags': arch/powerpc/include/asm/mmu.h:327:20: error: inlining failed in call to 'radix_enabled': call is unlikely and code size would grow [-Werror=inline] The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime and elimination of non possible parts of code at compile time. For this to work radix_enabled() must be inlined so make it __always_inline. Reported-by: Erhard F. <erhard_f@mailbox.org> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Trimmed error messages in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213803 Link: https://lore.kernel.org/r/20210804013724.514468-1-jniethe5@gmail.com
2021-08-10powerpc: Replace deprecated CPU-hotplug functions.Sebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210803141621.780504-4-bigeasy@linutronix.de
2021-08-10powerpc/kexec: fix for_each_child.cocci warningkernel test robot
for_each_node_by_type should have of_node_put() before return. Generated by: scripts/coccinelle/iterators/for_each_child.cocci Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2108031654080.17639@hadrien
2021-08-10powerpc/pseries: Prevent free CPU ids being reused on another nodeLaurent Dufour
When a CPU is hot added, the CPU ids are taken from the available mask from the lower possible set. If that set of values was previously used for a CPU attached to a different node, it appears to an application as if these CPUs have migrated from one node to another node which is not expected. To prevent this, it is needed to record the CPU ids used for each node and to not reuse them on another node. However, to prevent CPU hot plug to fail, in the case the CPU ids is starved on a node, the capability to reuse other nodes’ free CPU ids is kept. A warning is displayed in such a case to warn the user. A new CPU bit mask (node_recorded_ids_map) is introduced for each possible node. It is populated with the CPU onlined at boot time, and then when a CPU is hot plugged to a node. The bits in that mask remain when the CPU is hot unplugged, to remind this CPU ids have been used for this node. If no id set was found, a retry is made without removing the ids used on the other nodes to try reusing them. This is the way ids have been allocated prior to this patch. The effect of this patch can be seen by removing and adding CPUs using the Qemu monitor. In the following case, the first CPU from the node 2 is removed, then the first one from the node 1 is removed too. Later, the first CPU of the node 2 is added back. Without that patch, the kernel will number these CPUs using the first CPU ids available which are the ones freed when removing the second CPU of the node 0. This leads to the CPU ids 16-23 to move from the node 1 to the node 2. With the patch applied, the CPU ids 32-39 are used since they are the lowest free ones which have not been used on another node. At boot time: [root@vm40 ~]# numactl -H | grep cpus node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Vanilla kernel, after the CPU hot unplug/plug operations: [root@vm40 ~]# numactl -H | grep cpus node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 24 25 26 27 28 29 30 31 node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47 Patched kernel, after the CPU hot unplug/plug operations: [root@vm40 ~]# numactl -H | grep cpus node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 24 25 26 27 28 29 30 31 node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210429174908.16613-1-ldufour@linux.ibm.com
2021-08-10pseries/drmem: update LMBs after LPMLaurent Dufour
After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be updated by the hypervisor in the case the NUMA topology of the LPAR's memory is updated. This is handled by the kernel, but the memory's node is not updated because there is no way to move a memory block between nodes from the Linux kernel point of view. If later a memory block is added or removed, drmem_update_dt() is called and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to match the added or removed LMB. But the LMB's associativity node has not been updated after the DT node update and thus the node is overwritten by the Linux's topology instead of the hypervisor one. Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is updated to force an update of the LMB's associativity. However, ignore the call to that hook when the update has been triggered by drmem_update_dt(). Because, in that case, the LMB tree has been used to set the DT property and thus it doesn't need to be updated back. Since drmem_update_dt() is called under the protection of the device_hotplug_lock and the hook is called in the same context, use a simple boolean variable to detect that call. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com
2021-08-10powerpc/numa: Consider the max NUMA node for migratable LPARLaurent Dufour
When a LPAR is migratable, we should consider the maximum possible NUMA node instead of the number of NUMA nodes from the actual system. The DT property 'ibm,current-associativity-domains' defines the maximum number of nodes the LPAR can see when running on that box. But if the LPAR is being migrated on another box, it may see up to the nodes defined by 'ibm,max-associativity-domains'. So if a LPAR is migratable, that value should be used. Unfortunately, there is no easy way to know if an LPAR is migratable or not. The hypervisor exports the property 'ibm,migratable-partition' in the case it set to migrate partition, but that would not mean that the current partition is migratable. Without this patch, when a LPAR is started on a 2 node box and then migrated to a 3 node box, the hypervisor may spread the LPAR's CPUs on the 3rd node. In that case if a CPU from that 3rd node is added to the LPAR, it will be wrongly assigned to the node because the kernel has been set to use up to 2 nodes (the configuration of the departure node). With this patch applies, the CPU is correctly added to the 3rd node. Fixes: f9f130ff2ec9 ("powerpc/numa: Detect support for coregroup") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210511073136.17795-1-ldufour@linux.ibm.com
2021-08-10powerpc/non-smp: Unconditionaly call smp_mb() on switch_mmChristophe Leroy
Commit 3ccfebedd8cf ("powerpc, membarrier: Skip memory barrier in switch_mm()") added some logic to skip the smp_mb() in switch_mm_irqs_off() before the call to switch_mmu_context(). However, on non SMP smp_mb() is just a compiler barrier and doing it unconditionaly is simpler than the logic used to check whether the barrier is needed or not. After the patch: 00000000 <switch_mm_irqs_off>: ... c: 7c 04 18 40 cmplw r4,r3 10: 81 24 00 24 lwz r9,36(r4) 14: 91 25 04 c8 stw r9,1224(r5) 18: 4d 82 00 20 beqlr 1c: 48 00 00 00 b 1c <switch_mm_irqs_off+0x1c> 1c: R_PPC_REL24 switch_mmu_context Before the patch: 00000000 <switch_mm_irqs_off>: ... c: 7c 04 18 40 cmplw r4,r3 10: 81 24 00 24 lwz r9,36(r4) 14: 91 25 04 c8 stw r9,1224(r5) 18: 4d 82 00 20 beqlr 1c: 81 24 00 28 lwz r9,40(r4) 20: 71 29 00 0a andi. r9,r9,10 24: 40 82 00 34 bne 58 <switch_mm_irqs_off+0x58> 28: 48 00 00 00 b 28 <switch_mm_irqs_off+0x28> 28: R_PPC_REL24 switch_mmu_context ... 58: 2c 03 00 00 cmpwi r3,0 5c: 41 82 ff cc beq 28 <switch_mm_irqs_off+0x28> 60: 48 00 00 00 b 60 <switch_mm_irqs_off+0x60> 60: R_PPC_REL24 switch_mmu_context Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e9d501da0c59f60ca767b1b3ea4603fce6d02b9e.1625486440.git.christophe.leroy@csgroup.eu
2021-08-10powerpc: Remove in_kernel_text()Christophe Leroy
Last user of in_kernel_text() stopped using in with commit 549e8152de80 ("powerpc: Make the 64-bit kernel as a position-independent executable"). Generic function is_kernel_text() does the same. So remote it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2a3a5b6f8cc0ef4e854d7b764f66aa8d2ee270d2.1624813698.git.christophe.leroy@csgroup.eu
2021-08-04powerpc/64s/perf: Always use SIAR for kernel interruptsNicholas Piggin
If an interrupt is taken in kernel mode, always use SIAR for it rather than looking at regs_sipr. This prevents samples piling up around interrupt enable (hard enable or interrupt replay via soft enable) in PMUs / modes where the PR sample indication is not in synch with SIAR. This results in better sampling of interrupt entry and exit in particular. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
2021-08-04powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblingsParth Shah
On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus in thread-group share both L2 and L3 caches. Hence, use cache_property = 2 itself to find both the L2 and L3 cache siblings. Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings, but fill the mask using same property "2" array. Signed-off-by: Parth Shah <parth@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
2021-08-04powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()Gautham R. Shenoy
The helper function get_shared_cpu_map() was added in 'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct shared_cpu_map on big-cores")' and subsequently expanded upon in 'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache")' in order to help report the correct groups of threads sharing these caches on big-core systems where groups of threads within a core can share different sets of caches. Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property, cache->shared_cpu_map contains the correct set of thread-siblings sharing the cache. Hence we no longer need the functions get_shared_cpu_map(). This patch removes this function. We also remove the helper function index_dir_to_cpu() which was only called by get_shared_cpu_map(). With these functions removed, we can still see the correct cache-sibling map/list for L1 and L2 caches on systems with L1 and L2 caches distributed among groups of threads in a core. With this patch, on a SMT8 POWER10 system where the L1 and L2 caches are split between the two groups of threads in a core, for CPUs 8,9, the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as follows: $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15 $ ppc64_cpu --smt=4 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11 $ ppc64_cpu --smt=2 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9 $ ppc64_cpu --smt=1 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210728175607.591679-3-parth@linux.ibm.com
2021-08-04powerpc/cacheinfo: Lookup cache by dt node and thread-group idGautham R. Shenoy
Currently the cacheinfo code on powerpc indexes the "cache" objects (modelling the L1/L2/L3 caches) where the key is device-tree node corresponding to that cache. On some of the POWER server platforms thread-groups within the core share different sets of caches (Eg: On SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and threads 1,3,5,7 of the same core share another L1 cache). On such platforms, there is a single device-tree node corresponding to that cache and the cache-configuration within the threads of the core is indicated via "ibm,thread-groups" device-tree property. Since the current code is not aware of the "ibm,thread-groups" property, on the aforementoined systems, cacheinfo code still treats all the threads in the core to be sharing the cache because of the single device-tree node (In the earlier example, the cacheinfo code would says CPUs 0-7 share L1 cache). In this patch, we make the powerpc cacheinfo code aware of the "ibm,thread-groups" property. We indexe the "cache" objects by the key-pair (device-tree node, thread-group id). For any CPUX, for a given level of cache, the thread-group id is defined to be the first CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels of cache which are not represented in "ibm,thread-groups" property, the thread-group id is -1. [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map" and "thread_group_l2_cache_map" to get rid of the compile error.] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Parth Shah <parth@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
2021-08-04powerpc: move the install rule to arch/powerpc/MakefileMasahiro Yamada
Currently, the install target in arch/powerpc/Makefile descends into arch/powerpc/boot/Makefile to invoke the shell script, but there is no good reason to do so. arch/powerpc/Makefile can run the shell script directly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210729141937.445051-3-masahiroy@kernel.org
2021-08-04powerpc: make the install target not depend on any build artifactMasahiro Yamada
The install target should not depend on any build artifact. The reason is explained in commit 19514fc665ff ("arm, kbuild: make "make install" not depend on vmlinux"). Change the PowerPC installation code in a similar way. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210729141937.445051-2-masahiroy@kernel.org
2021-08-04powerpc: remove unused zInstall target from arch/powerpc/boot/MakefileMasahiro Yamada
Commit c913e5f95e54 ("powerpc/boot: Don't install zImage.* from make install") added the zInstall target to arch/powerpc/boot/Makefile, but you cannot use it since the corresponding hook is missing in arch/powerpc/Makefile. It has never worked since its addition. Nobody has complained about it for 7 years, which means this code was unneeded. With this removal, the install.sh will be passed in with 4 parameters. Simplify the shell script. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210729141937.445051-1-masahiroy@kernel.org
2021-08-03powerpc/stacktrace: Include linux/delay.hMichal Suchanek
commit 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()") introduces udelay() call without including the linux/delay.h header. This may happen to work on master but the header that declares the functionshould be included nonetheless. Fixes: 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210729180103.15578-1-msuchanek@suse.de