aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/perf/imc-pmu.c
AgeCommit message (Collapse)Author
2023-11-20powerpc/imc-pmu: Use the correct spinlock initializer.Sebastian Andrzej Siewior
[ Upstream commit 007240d59c11f87ac4f6cfc6a1d116630b6b634c ] The macro __SPIN_LOCK_INITIALIZER() is implementation specific. Users that desire to initialize a spinlock in a struct must use __SPIN_LOCK_UNLOCKED(). Use __SPIN_LOCK_UNLOCKED() for the spinlock_t in imc_global_refc. Fixes: 76d588dddc459 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230309134831.Nz12nqsU@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09powerpc/imc-pmu: Revert nest_init_lock to being a mutexMichael Ellerman
commit ad53db4acb415976761d7302f5b02e97f2bd097e upstream. The recent commit 76d588dddc45 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") fixed warnings (and possible deadlocks) in the IMC PMU driver by converting the locking to use spinlocks. It also converted the init-time nest_init_lock to a spinlock, even though it's not used at runtime in IRQ disabled sections or while holding other spinlocks. This leads to warnings such as: BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1 Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) __might_resched+0x178/0x1a0 __cpuhp_setup_state+0x64/0x1e0 init_imc_pmu+0xe48/0x1250 opal_imc_counters_probe+0x30c/0x6a0 platform_probe+0x78/0x110 really_probe+0x104/0x420 __driver_probe_device+0xb0/0x170 driver_probe_device+0x58/0x180 __driver_attach+0xd8/0x250 bus_for_each_dev+0xb4/0x140 driver_attach+0x34/0x50 bus_add_driver+0x1e8/0x2d0 driver_register+0xb4/0x1c0 __platform_driver_register+0x38/0x50 opal_imc_driver_init+0x2c/0x40 do_one_initcall+0x80/0x360 kernel_init_freeable+0x310/0x3b8 kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64 Fix it by converting nest_init_lock back to a mutex, so that we can call sleeping functions while holding it. There is no interaction between nest_init_lock and the runtime spinlocks used by the actual PMU routines. Fixes: 76d588dddc45 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") Tested-by: Kajol Jain<kjain@linux.ibm.com> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18powerpc/imc-pmu: Fix use of mutex in IRQs disabled sectionKajol Jain
commit 76d588dddc459fefa1da96e0a081a397c5c8e216 upstream. Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event. Command to trigger the warning: # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5 Performance counter stats for 'sleep 5': 0 thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ 5.002117947 seconds time elapsed 0.000131000 seconds user 0.001063000 seconds sys Below is snippet of the warning in dmesg: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec preempt_count: 2, expected: 0 4 locks held by perf-exec/2869: #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90 #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0 #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510 #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510 irq event stamp: 4806 hardirqs last enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510 softirqs last enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) __might_resched+0x2f8/0x310 __mutex_lock+0x6c/0x13f0 thread_imc_event_add+0xf4/0x1b0 event_sched_in+0xe0/0x210 merge_sched_in+0x1f0/0x600 visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0 ctx_flexible_sched_in+0xcc/0x140 ctx_sched_in+0x20c/0x2a0 ctx_resched+0x104/0x1c0 perf_event_exec+0x340/0x510 begin_new_exec+0x730/0xef0 load_elf_binary+0x3f8/0x1e10 ... do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0 WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0 CPU: 36 PID: 2869 Comm: sleep Tainted: G W 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV NIP: c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670 REGS: c00000004d2134e0 TRAP: 0700 Tainted: G W (6.2.0-rc2-00011-g1247637727f2) MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 48002824 XER: 00000000 CFAR: c00000000013fb64 IRQMASK: 1 The above warning triggered because the current imc-pmu code uses mutex lock in interrupt disabled sections. The function mutex_lock() internally calls __might_resched(), which will check if IRQs are disabled and in case IRQs are disabled, it will trigger the warning. Fix the issue by changing the mutex lock to spinlock. Fixes: 8f95faaac56c ("powerpc/powernv: Detect and create IMC device") Reported-by: Michael Petlan <mpetlan@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Fix comments, trim oops in change log, add reported-by tags] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05powerpc/perf: Add missing of_node_put()s in imc-pmu.cLiang He
In update_events_in_group(), of_find_node_by_phandle() will return a node pointer with refcount incremented. The reference should be dropped with of_node_put() in the failure path or when it is not used anymore. Signed-off-by: Liang He <windhl@126.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20220618071353.4059000-1-windhl@126.com
2022-05-08powerpc: Add missing headersChristophe Leroy
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h, asm/pci.h etc... Include the needed headers, and remove asm/prom.h when it was needed exclusively for pulling necessary headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2022-05-05powerpc: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2022-03-08powerpc: declare unmodified attribute_group usages constRohan McLure
Inspired by (bd75b4ef4977: Constify static attribute_group structs), accepted by linux-next, reported: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220210202805.7750-4-rikard.falkeborn@gmail.com/ Nearly all singletons of type struct attribute_group are never modified, and so are candidates for being const. Declare them as const. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220307231414.86560-1-rmclure@linux.ibm.com
2022-02-03powerpc/perf: Don't use perf_hw_context for trace IMC PMUAthira Rajeev
Trace IMC (In-Memory collection counters) in powerpc is useful for application level profiling. For trace_imc, presently task context (task_ctx_nr) is set to perf_hw_context. But perf_hw_context should only be used for CPU PMU. See commit 26657848502b ("perf/core: Verify we have a single perf_hw_context PMU"). So for trace_imc, even though it is per thread PMU, it is preferred to use sw_context in order to be able to do application level monitoring. Hence change the task_ctx_nr to use perf_sw_context. Fixes: 012ae244845f ("powerpc/perf: Trace imc PMU functions") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Update subject & incorporate notes into change log, reflow comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com
2020-11-18powerpc: fix -Wimplicit-fallthroughNick Desaulniers
The "fallthrough" pseudo-keyword was added as a portable way to denote intentional fallthrough. Clang will still warn on cases where there is a fallthrough to an immediate break. Add explicit breaks for those cases. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/236 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-11-09perf: Reduce stack usage of perf_output_begin()Peter Zijlstra
__perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
2020-09-14Merge branch 'fixes' into nextMichael Ellerman
Bring in our fixes branch for this cycle which avoids some small conflicts with upcoming commits.
2020-08-27powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imcAthira Rajeev
IMC trace-mode uses MSR[HV/PR] bits to set the cpumode for the instruction pointer captured in each sample. The bits are fetched from the third double word of the trace record. Reading third double word from IMC trace record should use be64_to_cpu() along with READ_ONCE inorder to fetch correct MSR[HV/PR] bits. Patch addresses this change. Currently we are using PERF_RECORD_MISC_HYPERVISOR as cpumode if MSR HV is 1 and PR is 0 which means the address is from host counter. But using PERF_RECORD_MISC_HYPERVISOR for host counter data will fail to resolve the address -> symbol during "perf report" because perf tools side uses PERF_RECORD_MISC_KERNEL to represent the host counter data. Therefore, fix the trace imc sample data to use PERF_RECORD_MISC_KERNEL as cpumode for host kernel information. Fixes: 77ca3951cc37 ("powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1598424029-1662-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-25powerpc/perf: Remove set but not used variable 'target'zhengbin
Fix gcc '-Wunused-but-set-variable' warning: arch/powerpc/perf/imc-pmu.c: In function trace_imc_event_init: arch/powerpc/perf/imc-pmu.c:1292:22: warning: variable target set but not used [-Wunused-but-set-variable] It is introduced by commit 012ae244845f ("powerpc/perf: Trace imc PMU functions"), but never used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1574144074-142032-3-git-send-email-zhengbin13@huawei.com
2020-07-16powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imcAnju T Sudhakar
IMC trace-mode record has MSR[HV PR] bits added in the third DW. These bits can be used to set the cpumode for the instruction pointer captured in each sample. Add support in kernel to use these bits to set the cpumode for each sample. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713144623.508695-1-maddy@linux.ibm.com
2020-04-16powerpc/perf: open access for CAP_PERFMON privileged processAlexey Budankov
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03powerpc/perf: Implement a global lock to avoid races between trace, core and ↵Anju T Sudhakar
thread imc events. IMC(In-memory Collection Counters) does performance monitoring in two different modes, i.e accumulation mode(core-imc and thread-imc events), and trace mode(trace-imc events). A cpu thread can either be in accumulation-mode or trace-mode at a time and this is done via the LDBAR register in POWER architecture. The current design does not address the races between thread-imc and trace-imc events. Patch implements a global id and lock to avoid the races between core, trace and thread imc events. With this global id-lock implementation, the system can either run core, thread or trace imc events at a time. i.e. to run any core-imc events, thread/trace imc events should not be enabled/monitored. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com
2019-08-20powerpc/perf: fix imc allocation failure handlingNicholas Piggin
The alloc_pages_node return value should be tested for failure before being passed to page_address. Tested-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-3-npiggin@gmail.com
2019-06-19powerpc/perf: Use cpumask_last() to determine the designated cpu for ↵Anju T Sudhakar
nest/core units. Nest and core IMC (In-Memory Collection counters) assigns a particular cpu as the designated target for counter data collection. During system boot, the first online cpu in a chip gets assigned as the designated cpu for that chip(for nest-imc) and the first online cpu in a core gets assigned as the designated cpu for that core(for core-imc). If the designated cpu goes offline, the next online cpu from the same chip(for nest-imc)/core(for core-imc) is assigned as the next target, and the event context is migrated to the target cpu. Currently, cpumask_any_but() function is used to find the target cpu. Though this function is expected to return a `random` cpu, this always returns the next online cpu. If all cpus in a chip/core is offlined in a sequential manner, starting from the first cpu, the event migration has to happen for all the cpus which goes offline. Since the migration process involves a grace period, the total time taken to offline all the cpus will be significantly high. Example: In a system which has 2 sockets, with NUMA node0 CPU(s): 0-87 NUMA node8 CPU(s): 88-175 Time taken to offline cpu 88-175: real 2m56.099s user 0m0.191s sys 0m0.000s Use cpumask_last() to choose the target cpu, when the designated cpu goes online, so the migration will happen only when the last_cpu in the mask goes offline. This way the time taken to offline all cpus in a chip/core can be reduced. With the patch: Time taken to offline cpu 88-175: real 0m12.207s user 0m0.171s sys 0m0.000s Offlining all cpus in reverse order is also taken care because, cpumask_any_but() is used to find the designated cpu if the last cpu in the mask goes offline. Since cpumask_any_but() always return the first cpu in the mask, that becomes the designated cpu and migration will happen only when the first_cpu in the mask goes offline. Example: With the patch, Time taken to offline cpu from 175-88: real 0m9.330s user 0m0.110s sys 0m0.000s Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 4 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091650.480557885@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-03powerpc/perf: Trace imc PMU functionsAnju T Sudhakar
Add PMU functions to support trace-imc. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Trace imc events detection and cpuhotplugAnju T Sudhakar
Patch detects trace-imc events, does memory initilizations for each online cpu, and registers cpuhotplug call-backs. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Add privileged access check for thread_imcMadhavan Srinivasan
Add code to restrict user access to thread_imc pmu since some event report privilege level information. Fixes: f74c89bd80fb3 ("powerpc/perf: Add thread IMC PMU support") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Rearrange setting of ldbar for thread-imcAnju T Sudhakar
LDBAR holds the memory address allocated for each cpu. For thread-imc the mode bit (i.e bit 1) of LDBAR is set to accumulation. Currently, ldbar is loaded with per cpu memory address and mode set to accumulation at boot time. To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del(). Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Fix loop exit condition in nest_imc_event_initAnju T Sudhakar
The data structure (i.e struct imc_mem_info) to hold the memory address information for nest imc units is allocated based on the number of nodes in the system. nest_imc_event_init() traverse this struct array to calculate the memory base address for the event-cpu. If we fail to find a match for the event cpu's chip-id in imc_mem_info struct array, then the do-while loop will iterate until we crash. Fix this by changing the loop exit condition based on the number of non zero vbase elements in the array, since the allocation is done for nr_chips + 1. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Return accordingly on invalid chip-id inAnju T Sudhakar
Nest hardware counter memory resides in a per-chip reserve-memory. During nest_imc_event_init(), chip-id of the event-cpu is considered to calculate the base memory addresss for that cpu. Return, proper error condition if the chip_id calculated is invalid. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-21perf/core, arch/powerpc: use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable ↵Andrew Murray
PMUs For PowerPC PMUs that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@armlinux.org.uk> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: robin.murphy@arm.com Cc: suzuki.poulose@arm.com Link: https://lkml.kernel.org/r/1547128414-50693-10-git-send-email-andrew.murray@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-25powerpc/perf: Declare static identifier a suchBreno Leitao
There are three symbols (two variables and a function) that are being used solely in the same file (imc-pmu.c), thus, these symbols should be static, but they are not. This was detected by sparse: arch/powerpc/perf/imc-pmu.c:31:20: warning: symbol 'nest_imc_refc' was not declared. Should it be static? arch/powerpc/perf/imc-pmu.c:37:20: warning: symbol 'core_imc_refc' was not declared. Should it be static? arch/powerpc/perf/imc-pmu.c:46:16: warning: symbol 'imc_event_to_pmu' was not declared. Should it be static? This patch simply adds the 'static' storage-class definition to these symbols, thus, restricting their usage only in the imc-pmu.c file. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/perf: Quiet IMC PMU registration messageJoel Stanley
On a Power9 box we get a few screens full of these on boot. Drop them to pr_debug. [ 5.993645] nest_centaur6_imc performance monitor hardware support registered [ 5.993728] nest_centaur7_imc performance monitor hardware support registered [ 5.996510] core_imc performance monitor hardware support registered [ 5.996569] nest_mba0_imc performance monitor hardware support registered [ 5.996631] nest_mba1_imc performance monitor hardware support registered [ 5.996685] nest_mba2_imc performance monitor hardware support registered Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08powerpc/perf: Remove sched_task function defined for thread-imcAnju T Sudhakar
Call trace observed while running perf-fuzzer: CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 4.13.0-32-generic #35~lp1746225 task: c000003f776ac900 task.stack: c000003f77728000 NIP: c000000000299b70 LR: c0000000002a4534 CTR: c00000000029bb80 REGS: c000003f7772b760 TRAP: 0700 Not tainted (4.13.0-32-generic) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24008822 XER: 00000000 CFAR: c000000000299a70 SOFTE: 0 GPR00: c0000000002a4534 c000003f7772b9e0 c000000001606200 c000003fef858908 GPR04: c000003f776ac900 0000000000000001 ffffffffffffffff 0000003fee730000 GPR08: 0000000000000000 0000000000000000 c0000000011220d8 0000000000000002 GPR12: c00000000029bb80 c000000007a3d900 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c000003f776ad090 c000000000c71354 GPR24: c000003fef716780 0000003fee730000 c000003fe69d4200 c000003f776ad330 GPR28: c0000000011220d8 0000000000000001 c0000000014c6108 c000003fef858900 NIP [c000000000299b70] perf_pmu_sched_task+0x170/0x180 LR [c0000000002a4534] __perf_event_task_sched_in+0xc4/0x230 Call Trace: perf_iterate_sb+0x158/0x2a0 (unreliable) __perf_event_task_sched_in+0xc4/0x230 finish_task_switch+0x21c/0x310 __schedule+0x304/0xb80 schedule+0x40/0xc0 do_wait+0x254/0x2e0 kernel_wait4+0xa0/0x1a0 SyS_wait4+0x64/0xc0 system_call+0x58/0x6c Instruction dump: 3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 38210050 eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <0fe00000> 4bffffbc 60000000 60420000 ---[ end trace 8c46856d314c1811 ]--- The context switch call-backs for thread-imc are defined in sched_task function. So when thread-imc events are grouped with software pmu events, perf_pmu_sched_task hits the WARN_ON_ONCE condition, since software PMUs are assumed not to have a sched_task defined. Patch to move the thread_imc enable/disable opal call back from sched_task to event_[add/del] function Fixes: f74c89bd80fb ("powerpc/perf: Add thread IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/perf: Unregister thread-imc if core-imc not supportedAnju T Sudhakar
Since thread-imc internally use the core-imc hardware infrastructure and is depended on it, having thread-imc in the kernel in the absence of core-imc is trivial. Patch disables thread-imc, if core-imc is not registered. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/perf: Return appropriate value for unknown domainAnju T Sudhakar
Return proper error code for unknown domain during IMC initialization. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/perf: Replace the direct return with goto statementAnju T Sudhakar
Replace the direct return statement in imc_mem_init() with goto, to adhere to the kernel coding style. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/perf: Rearrange memory freeing in imc initAnju T Sudhakar
When any of the IMC (In-Memory Collection counter) devices fail to initialize, imc_common_mem_free() frees set of memory. In doing so, pmu_ptr pointer is also freed. But pmu_ptr pointer is used in subsequent function (imc_common_cpuhp_mem_free()) which is wrong. Patch here reorders the code to avoid such access. Also free the memory which is dynamically allocated during imc initialization, wherever required. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus()Anju T Sudhakar
Currently memory is allocated for core-imc based on cpu_present_mask, which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads per core) as the array index to access the memory. Under some circumstances firmware marks a CPU as GUARDed CPU and boot the system, until cleared of errors, these CPU's are unavailable for all subsequent boots. GUARDed CPUs are possible but not present from linux view, so it blows a hole when we assume the max length of our allocation is driven by our max present cpus, where as one of the cpus might be online and be beyond the max present cpus, due to the hole. So (cpu number / threads per core) value bounds the array index and leads to memory overflow. Call trace observed during a guard test: Faulting instruction address: 0xc000000000149f1c cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420] pc:c000000000149f1c: prefetch_freepointer+0x14/0x30 lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac sp:c000003fea3036a0 msr:9000000000009033 dar:c9c54b2c91dbf6b7 current = 0xc000003fea2c0000 paca = 0xc00000000fddd880 softe: 3 irq_happened: 0x01 pid = 1, comm = swapper/104 Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018 enter ? for help call trace: __kmalloc+0x1a8/0x1ac (unreliable) init_imc_pmu+0x7f4/0xbf0 opal_imc_counters_probe+0x3fc/0x43c platform_drv_probe+0x48/0x80 driver_probe_device+0x22c/0x308 __driver_attach+0xa0/0xd8 bus_for_each_dev+0x88/0xb4 driver_attach+0x2c/0x40 bus_add_driver+0x1e8/0x228 driver_register+0xd0/0x114 __platform_driver_register+0x50/0x64 opal_imc_driver_init+0x24/0x38 do_one_initcall+0x150/0x15c kernel_init_freeable+0x250/0x254 kernel_init+0x1c/0x150 ret_from_kernel_thread+0x5c/0xc8 Allocating memory for core-imc based on cpu_possible_mask, which has bit 'cpu' set iff cpu is populatable, will fix this issue. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Fixes: 39a846db1d57 ("powerpc/perf: Add core IMC PMU support") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.
2018-01-19powerpc/perf: Change the data type for the variable 'ncpu' in IMC codeAnju T Sudhakar
Change the data type for the variable 'ncpu' in ppc_core_imc_cpu_offline(), since cpumask_any_but() returns an 'int' value. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/perf: Pass struct imc_events as a parameter to imc_parse_event()Anju T Sudhakar
Remove the allocation of struct imc_events from imc_parse_event(). Instead pass imc_events as a parameter to imc_parse_event(), which is a pointer to a slot in the array allocated in update_events_in_group(). Reported-by: Dan Carpenter ("powerpc/perf: Fix a sizeof() typo so we allocate less memory") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/perf: IMC code cleanup with some code refactoringAnju T Sudhakar
Factor out memory freeing part for attribute elements from imc_common_cpuhp_mem_free(). Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/perf: Remove thread_imc_pmu global variable fromAnju T Sudhakar
Remove the global variable 'thread_imc_pmu', since it is not used in the code. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13powerpc/perf: Fix kfree memory allocated for nest pmusAnju T Sudhakar
imc_common_cpuhp_mem_free() is the common function for all IMC (In-memory Collection counters) domains to unregister cpuhotplug callback and free memory. Since kfree of memory allocated for nest-imc (per_nest_pmu_arr) is in the common code, all domains (core/nest/thread) can do the kfree in the failure case. This could potentially create a call trace as shown below, where core(/thread/nest) imc pmu initialization fails and in the failure path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr), which is allocated by successfully registered nest units. The call trace is generated in a scenario where core-imc initialization is made to fail and a cpuhotplug is performed in a p9 system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access per_nest_pmu_arr, which is already freed by core-imc. NIP [c000000000cb6a94] mutex_lock+0x34/0x90 LR [c000000000cb6a88] mutex_lock+0x28/0x90 Call Trace: mutex_lock+0x28/0x90 (unreliable) perf_pmu_migrate_context+0x90/0x3a0 ppc_nest_imc_cpu_offline+0x190/0x1f0 cpuhp_invoke_callback+0x160/0x820 cpuhp_thread_fun+0x1bc/0x270 smpboot_thread_fn+0x250/0x290 kthread+0x1a8/0x1b0 ret_from_kernel_thread+0x5c/0x74 To address this scenario do the kfree(per_nest_pmu_arr) only in case of nest-imc initialization failure, and when there is no other nest units registered. Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13powerpc/perf/imc: Fix nest-imc cpuhotplug callback failureAnju T Sudhakar
Oops is observed during boot: Faulting instruction address: 0xc000000000248340 cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850] pc: c000000000248340: event_function_call+0x50/0x1f0 lr: c00000000024878c: perf_remove_from_context+0x3c/0x100 sp: c000000ff66fbad0 msr: 9000000000009033 dar: 7d20e2a6f92d03c0 pid = 14, comm = cpuhp/0 While registering the cpuhotplug callbacks for nest-imc, if we fail in the cpuhotplug online path for any random node in a multi node system (because the opal call to stop nest-imc counters fails for that node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who successfully returned from cpuhotplug online path. This call trace is generated since in the ppc_nest_imc_cpu_offline() path we are trying to migrate the event context, when nest-imc counters are not even initialized. Patch to add a check to ensure that nest-imc is registered before migrating the event context. Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22powerpc/perf: Fix IMC_MAX_PMU macroMadhavan Srinivasan
IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds nest pmu information. Current value for the macro is 32 based on the initial number of nest pmu units supported by the nest microcode. But going forward, microcode could support more nest units. Instead of static storage, patch to fix the code to dynamically allocate an array based on the number of nest imc units found in the device tree. Fixes:8f95faaac56c1 ('powerpc/powernv: Detect and create IMC device') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()Michael Ellerman
init_imc_pmu() uses topology_physical_package_id() to detect the node id of the processor it is on to get local memory, but that's wrong, and can lead to crashes. Fix it to use cpu_to_node(). Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Cc: stable@vger.kernel.org # v4.14+ Reported-By: Rob Lippert <rlippert@google.com> Tested-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-03powerpc/perf: Fix core-imc hotplug callback failure during imc initializationMadhavan Srinivasan
Call trace observed during boot: nest_capp0_imc performance monitor hardware support registered nest_capp1_imc performance monitor hardware support registered core_imc memory allocation for cpu 56 failed Unable to handle kernel paging request for data at address 0xffa400010 Faulting instruction address: 0xc000000000bf3294 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c000000ff38ff8d0] pc: c000000000bf3294: mutex_lock+0x34/0x90 lr: c000000000bf3288: mutex_lock+0x28/0x90 sp: c000000ff38ffb50 msr: 9000000002009033 dar: ffa400010 dsisr: 80000 current = 0xc000000ff383de00 paca = 0xc000000007ae0000 softe: 0 irq_happened: 0x01 pid = 13, comm = cpuhp/0 Linux version 4.11.0-39.el7a.ppc64le (mockbuild@ppc-058.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Oct 3 07:42:44 EDT 2017 0:mon> t [c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470 [c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0 [c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0 [c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0 [c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0 [c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0 [c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74 While registering the cpuhoplug callbacks for core-imc, if we fails in the cpuhotplug online path for any random core (either because opal call to initialize the core-imc counters fails or because memory allocation fails for that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who successfully returned from cpuhotplug online path. But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event context, when core-imc counters are not even initialized. Thus creating the above stack dump. Add a check to see if core-imc counters are enabled or not in the cpuhotplug offline path before migrating the context to handle this failing scenario. Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-25powerpc/perf: Fix IMC allocation routineGuilherme G. Piccoli
When setting nr_cpus=1, we observed a crash in IMC code during boot due to a missing allocation: basically, IMC code is taking the number of threads into account in imc_mem_init() and if we manually set nr_cpus for a value that is not multiple of the number of threads per core, an integer division in that function will discard the decimal portion, leading IMC to not allocate one mem_info struct. This causes a NULL pointer dereference later, on is_core_imc_mem_inited(). This patch just rounds that division up, fixing the bug. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-13powerpc/perf: Fix IMC initialization crashAnju T Sudhakar
Panic observed with latest firmware, and upstream kernel: NIP init_imc_pmu+0x8c/0xcf0 LR init_imc_pmu+0x2f8/0xcf0 Call Trace: init_imc_pmu+0x2c8/0xcf0 (unreliable) opal_imc_counters_probe+0x300/0x400 platform_drv_probe+0x64/0x110 driver_probe_device+0x3d8/0x580 __driver_attach+0x14c/0x1a0 bus_for_each_dev+0x8c/0xf0 driver_attach+0x34/0x50 bus_add_driver+0x298/0x350 driver_register+0x9c/0x180 __platform_driver_register+0x5c/0x70 opal_imc_driver_init+0x2c/0x40 do_one_initcall+0x64/0x1d0 kernel_init_freeable+0x280/0x374 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0x74 While registering nest imc at init, cpu-hotplug callback nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup memory and cpuhotplug setup. But when cleaning up the attribute group, we are dereferencing the attribute element array without checking whether the backing element is not NULL. This causes the kernel panic. Add a check for the backing element prior to dereferencing the attribute element, to handle the failing case gracefully. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> [mpe: Trim change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-12powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()Anju T Sudhakar
Stack trace output during a stress test: [ 4.310049] Freeing initrd memory: 22592K [ 4.310646] rtas_flash: no firmware flash support [ 4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null) [ 4.313465] cpuhp/64 cpuset=/ mems_allowed=0 [ 4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1 [ 4.313588] Call Trace: [ 4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable) [ 4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0 [ 4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000 [ 4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0 [ 4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170 [ 4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0 [ 4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0 [ 4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0 [ 4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0 [ 4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74 [ 4.314313] Mem-Info: [ 4.314356] active_anon:0 inactive_anon:0 isolated_anon:0 core_imc_mem_init() at system boot use alloc_pages_node() to get memory and alloc_pages_node() throws this stack dump when tried to allocate memory from a node which has no memory behind it. Add a ___GFP_NOWARN flag in allocation request as a fix. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Venkat R.B <venkatb3@in.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-12powerpc/perf: Fix for core/nest imc call trace on cpuhotplugAnju T Sudhakar
Nest/core pmu units are enabled only when it is used. A reference count is maintained for the events which uses the nest/core pmu units. Currently in *_imc_counters_release function a WARN() is used for notification of any underflow of ref count. The case where event ref count hit a negative value is, when perf session is started, followed by offlining of all cpus in a given core. i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the ref->count to zero, if the current cpu which is about to offline is the last cpu in a given core and make an OPAL call to disable the engine in that core. And on perf session termination, perf->destroy (core_imc_counters_release) will first decrement the ref->count for this core and based on the ref->count value an opal call is made to disable the core-imc engine. Now, since cpuhotplug path already clears the ref->count for core and disabled the engine, perf->destroy() decrementing again at event termination make it negative which in turn fires the WARN_ON. The same happens for nest units. Add a check to see if the reference count is alreday zero, before decrementing the count, so that the ref count will not hit a negative value. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/perf: Fix usage of nest_imc_refcMadhavan Srinivasan
nest_imc_refc is a reference count struct, used to track number of active perf sessions using the nest units. Currently the code accesses nest_imc_refc using node_id, which is incorrect, the array is indexed by node number. Meaning in the case of sparse node ids we index off the end of the array. Fix it to use get_nest_pmu_ref() which uses the existing per-cpu variable local_nest_imc_refc. Fixes: 885dcd709ba91 ('powerpc/perf: Add nest IMC PMU support') Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/perf/imc: Fix nest events on muti socket systemAnju T
In a multi node system with discontiguous node ids, nest event values are not showing up properly. eg. lscpu output: NUMA node0 CPU(s): 0-15 NUMA node8 CPU(s): 16-31 Nest event values on such systems can be counted on CPUs <= 15: $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 0-14 -I 1000 sleep 1000 # time counts unit events 1.000294577 30,17,24,42,880 nest_powerbus0_imc/PM_PB_CYC/ But not on CPUs >= 16: $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000 # time counts unit events 1.000049902 <not supported> nest_powerbus0_imc/PM_PB_CYC/ This is because, when fetching the reference count, the node id (which may be sparse) is used as the array index, not the node number (which is 0 based and contiguous). Fix it by using the node number as the array index. $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000 # time counts unit events 1.000241961 26,12,35,28,704 nest_powerbus0_imc/PM_PB_CYC/ Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> [mpe: Change log tweaks for clarity and brevity] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>