Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve uclamp performance by using a static key for the fast path
- Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for
better power efficiency of RT tasks on battery powered devices.
(The default is to maximize performance & reduce RT latencies.)
- Improve utime and stime tracking accuracy, which had a fixed boundary
of error, which created larger and larger relative errors as the
values become larger. This is now replaced with more precise
arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h.
- Improve the deadline scheduler, such as making it capacity aware
- Improve frequency-invariant scheduling
- Misc cleanups in energy/power aware scheduling
- Add sched_update_nr_running tracepoint to track changes to nr_running
- Documentation additions and updates
- Misc cleanups and smaller fixes
* tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rst
sched/doc: Document capacity aware scheduling
sched: Document arch_scale_*_capacity()
arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE
Documentation/sysctl: Document uclamp sysctl knobs
sched/uclamp: Add a new sysctl to control RT default boost value
sched/uclamp: Fix a deadlock when enabling uclamp static key
sched: Remove duplicated tick_nohz_full_enabled() check
sched: Fix a typo in a comment
sched/uclamp: Remove unnecessary mutex_init()
arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE
sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry
arch_topology, sched/core: Cleanup thermal pressure definition
trace/events/sched.h: fix duplicated word
linux/sched/mm.h: drop duplicated words in comments
smp: Fix a potential usage of stale nr_cpus
sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
sched: nohz: stop passing around unused "ticks" parameter.
sched: Better document ttwu()
sched: Add a tracepoint to track rq->nr_running
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
"HW support updates:
- Add uncore support for Intel Comet Lake
- Add RAPL support for Hygon Fam18h
- Add Intel "IIO stack to PMON mapping" support on Skylake-SP CPUs,
which enumerates per device performance counters via sysfs and
enables the perf stat --iiostat functionality
- Add support for Intel "Architectural LBRs", which generalized the
model specific LBR hardware tracing feature into a
model-independent, architected performance monitoring feature.
Usage is mostly seamless to tooling, as the pre-existing LBR
features are kept, but there's a couple of advantages under the
hood, such as faster context-switching, faster LBR reads, cleaner
exposure of LBR features to guest kernels, etc.
( Since architectural LBRs are supported via XSAVE, there's related
changes to the x86 FPU code as well. )
ftrace/perf updates:
- Add support to add a text poke event to record changes to kernel
text (i.e. self-modifying code) in order to support tracers like
Intel PT decoding through jump labels, kprobes and ftrace
trampolines.
Misc cleanups, smaller fixes..."
* tag 'perf-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
perf/x86/rapl: Add Hygon Fam18h RAPL support
kprobes: Remove unnecessary module_mutex locking from kprobe_optimizer()
x86/perf: Fix a typo
perf: <linux/perf_event.h>: drop a duplicated word
perf/x86/intel/lbr: Support XSAVES for arch LBR read
perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch
x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature
x86/fpu/xstate: Support dynamic supervisor feature for LBR
x86/fpu: Use proper mask to replace full instruction mask
perf/x86: Remove task_ctx_size
perf/x86/intel/lbr: Create kmem_cache for the LBR context data
perf/core: Use kmem_cache to allocate the PMU specific data
perf/core: Factor out functions to allocate/free the task_ctx_data
perf/x86/intel/lbr: Support Architectural LBR
perf/x86/intel/lbr: Factor out intel_pmu_store_lbr
perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all()
perf/x86/intel/lbr: Mark the {rd,wr}lbr_{to,from} wrappers __always_inline
perf/x86/intel/lbr: Unify the stored format of LBR information
perf/x86/intel/lbr: Support LBR_CTL
perf/x86: Expose CPUID enumeration bits for arch LBR
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- LKMM updates: mostly documentation changes, but also some new litmus
tests for atomic ops.
- KCSAN updates: the most important change is that GCC 11 now has all
fixes in place to support KCSAN, so GCC support can be enabled again.
Also more annotations.
- futex updates: minor cleanups and simplifications
- seqlock updates: merge preparatory changes/cleanups for the
'associated locks' facilities.
- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out <linux/lockdep_types.h>,
decouple lockdep from other low level headers some more
- fix NMI handling
- misc cleanups and smaller fixes
* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
kcsan: Improve IRQ state trace reporting
lockdep: Refactor IRQ trace events fields into struct
seqlock: lockdep assert non-preemptibility on seqcount_t write
lockdep: Add preemption enabled/disabled assertion APIs
seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock: Reorder seqcount_t and seqlock_t API definitions
seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
seqlock: Properly format kernel-doc code samples
Documentation: locking: Describe seqlock design and usage
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
locking/atomic: Move ATOMIC_INIT into linux/types.h
lockdep: Move list.h inclusion into lockdep.h
locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
futex: Remove unused or redundant includes
futex: Consistently use fshared as boolean
futex: Remove needless goto's
futex: Remove put_futex_key()
rwsem: fix commas in initialisation
docs: locking: Replace HTTP links with HTTPS ones
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
- kfree_rcu updates
- RCU tasks updates
- Read-side scalability tests
- SRCU updates
- Torture-test updates
- Documentation updates
- Miscellaneous fixes
* tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits)
torture: Remove obsolete "cd $KVM"
torture: Avoid duplicate specification of qemu command
torture: Dump ftrace at shutdown only if requested
torture: Add kvm-tranform.sh script for qemu-cmd files
torture: Add more tracing crib notes to kvm.sh
torture: Improve diagnostic for KCSAN-incapable compilers
torture: Correctly summarize build-only runs
torture: Pass --kmake-arg to all make invocations
rcutorture: Check for unwatched readers
torture: Abstract out console-log error detection
torture: Add a stop-run capability
torture: Create qemu-cmd in --buildonly runs
rcu/rcutorture: Replace 0 with false
torture: Add --allcpus argument to the kvm.sh script
torture: Remove whitespace from identify_qemu_vcpus output
rcutorture: NULL rcu_torture_current earlier in cleanup code
rcutorture: Handle non-statistic bang-string error messages
torture: Set configfile variable to current scenario
rcutorture: Add races with task-exit processing
locktorture: Use true and false to assign to bool variables
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Fix a recent IRQ affinities regression, add in a missing debugfs
printout that helps the debugging of IRQ affinity logic bugs, and fix
a memory leak"
* tag 'irq-urgent-2020-08-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/debugfs: Add missing irqchip flags
genirq/affinity: Make affinity setting if activated opt-in
irqdomain/treewide: Free firmware node after domain removal
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 and cross-arch updates from Catalin Marinas:
"Here's a slightly wider-spread set of updates for 5.9.
Going outside the usual arch/arm64/ area is the removal of
read_barrier_depends() series from Will and the MSI/IOMMU ID
translation series from Lorenzo.
The notable arm64 updates include ARMv8.4 TLBI range operations and
translation level hint, time namespace support, and perf.
Summary:
- Removal of the tremendously unpopular read_barrier_depends()
barrier, which is a NOP on all architectures apart from Alpha, in
favour of allowing architectures to override READ_ONCE() and do
whatever dance they need to do to ensure address dependencies
provide LOAD -> LOAD/STORE ordering.
This work also offers a potential solution if compilers are shown
to convert LOAD -> LOAD address dependencies into control
dependencies (e.g. under LTO), as weakly ordered architectures will
effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at
LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic,
augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the
device ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions
and kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words"
Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux
Pull unicore32 removal from Mike Rapoport:
"Remove unicore32 support.
The unicore32 port do not seem maintained for a long time now, there
is no upstream toolchain that can create unicore32 binaries and all
the links to prebuilt toolchains for unicore32 are dead. Even
compilers that were available are not supported by the kernel anymore.
Guenter Roeck says:
"I have stopped building unicore32 images since v4.19 since there is
no available compiler that is still supported by the kernel. I am
surprised that support for it has not been removed from the kernel"
However, it's worth pointing out two things:
- Guan Xuetao is still listed as maintainer and asked for the port to
be kept around the last time Arnd suggested removing it two years
ago. He promised that there would be compiler sources (presumably
llvm), but has not made those available since.
- https://github.com/gxt has patches to linux-4.9 and qemu-2.7, both
released in 2016, with patches dated early 2019. These patches
mainly restore a syscall ABI that was never part of mainline Linux
but apparently used in production. qemu-2.8 removed support for
that ABI and newer kernels (4.19+) can no longer be built with the
old toolchain, so apparently there will not be any future updates
to that git tree"
* tag 'rm-unicore32' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux:
MAINTAINERS: remove "PKUNITY SOC DRIVERS" entry
rtc: remove fb-puv3 driver
video: fbdev: remove fb-puv3 driver
pwm: remove pwm-puv3 driver
input: i8042: remove support for 8042-unicore32io
i2c/buses: remove i2c-puv3 driver
cpufreq: remove unicore32 driver
arch: remove unicore32 port
|
|
Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.
- Softirq completion cleanups (Christoph)
- Stop using ->queuedata (Christoph)
- Cleanup bd claiming (Christoph)
- Use check_events, moving away from the legacy media change
(Christoph)
- Use inode i_blkbits consistently (Christoph)
- Remove old unused writeback congestion bits (Christoph)
- Cleanup/unify submission path (Christoph)
- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)
- sbitmap cleared bits handling (John)
- Request merging blktrace event addition (Jan)
- sysfs add/remove race fixes (Luis)
- blk-mq tag fixes/optimizations (Ming)
- Duplicate words in comments (Randy)
- Flush deferral cleanup (Yufen)
- IO context locking/retry fixes (John)
- struct_size() usage (Gustavo)
- blk-iocost fixes (Chengming)
- blk-cgroup IO stats fixes (Boris)
- Various little fixes"
* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add support for allocating transforms on a specific NUMA Node
- Introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY for storage users
Algorithms:
- Drop PMULL based ghash on arm64
- Fixes for building with clang on x86
- Add sha256 helper that does the digest in one go
- Add SP800-56A rev 3 validation checks to dh
Drivers:
- Permit users to specify NUMA node in hisilicon/zip
- Add support for i.MX6 in imx-rngc
- Add sa2ul crypto driver
- Add BA431 hwrng driver
- Add Ingenic JZ4780 and X1000 hwrng driver
- Spread IRQ affinity in inside-secure and marvell/cesa"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (157 commits)
crypto: sa2ul - Fix inconsistent IS_ERR and PTR_ERR
hwrng: core - remove redundant initialization of variable ret
crypto: x86/curve25519 - Remove unused carry variables
crypto: ingenic - Add hardware RNG for Ingenic JZ4780 and X1000
dt-bindings: RNG: Add Ingenic RNG bindings.
crypto: caam/qi2 - add module alias
crypto: caam - add more RNG hw error codes
crypto: caam/jr - remove incorrect reference to caam_jr_register()
crypto: caam - silence .setkey in case of bad key length
crypto: caam/qi2 - create ahash shared descriptors only once
crypto: caam/qi2 - fix error reporting for caam_hash_alloc
crypto: caam - remove deadcode on 32-bit platforms
crypto: ccp - use generic power management
crypto: xts - Replace memcpy() invocation with simple assignment
crypto: marvell/cesa - irq balance
crypto: inside-secure - irq balance
crypto: ecc - SP800-56A rev 3 local public key validation
crypto: dh - SP800-56A rev 3 local public key validation
crypto: dh - check validity of Z before export
lib/mpi: Add mpi_sub_ui()
...
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
That gives us ordering guarantees around the pair.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
1) Encap offset calculation is incorrect in esp6, from Sabrina Dubroca.
2) Better parameter validation in pfkey_dump(), from Mark Salyzyn.
3) Fix several clang issues on powerpc in selftests, from Tanner Love.
4) cmsghdr_from_user_compat_to_kern() uses the wrong length, from Al
Viro.
5) Out of bounds access in mlx5e driver, from Raed Salem.
6) Fix transfer buffer memleak in lan78xx, from Johan Havold.
7) RCU fixups in rhashtable, from Herbert Xu.
8) Fix ipv6 nexthop refcnt leak, from Xiyu Yang.
9) vxlan FDB dump must be done under RCU, from Ido Schimmel.
10) Fix use after free in mlxsw, from Ido Schimmel.
11) Fix map leak in HASH_OF_MAPS bpf code, from Andrii Nakryiko.
12) Fix bug in mac80211 Tx ack status reporting, from Vasanthakumar
Thiagarajan.
13) Fix memory leaks in IPV6_ADDRFORM code, from Cong Wang.
14) Fix bpf program reference count leaks in mlx5 during
mlx5e_alloc_rq(), from Xin Xiong.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
vxlan: fix memleak of fdb
rds: Prevent kernel-infoleak in rds_notify_queue_get()
net/sched: The error lable position is corrected in ct_init_module
net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq
net/mlx5e: E-Switch, Specify flow_source for rule with no in_port
net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring
net/mlx5e: CT: Support restore ipv6 tunnel
net: gemini: Fix missing clk_disable_unprepare() in error path of gemini_ethernet_port_probe()
ionic: unlock queue mutex in error path
atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
net: ethernet: mtk_eth_soc: fix MTU warnings
net: nixge: fix potential memory leak in nixge_probe()
devlink: ignore -EOPNOTSUPP errors on dumpit
rxrpc: Fix race between recvmsg and sendmsg on immediate call failure
MAINTAINERS: Replace Thor Thayer as Altera Triple Speed Ethernet maintainer
selftests/bpf: fix netdevsim trap_flow_action_cookie read
ipv6: fix memory leaks on IPV6_ADDRFORM path
net/bpfilter: Initialize pos in __bpfilter_process_sockopt
igb: reinit_locked() should be called with rtnl_lock
e1000e: continue to init PHY even when failed to disable ULP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fix from Christian Brauner:
"A simple spelling fix for dequeue_synchronous_signal()"
* tag 'for-linus-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
signal: fix typo in dequeue_synchronous_signal()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core
Pull v5.9 KCSAN bits from Paul E. McKenney.
Perhaps the most important change is that GCC 11 now has all fixes in place
to support KCSAN, so GCC support can be enabled again.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Rather that hide their purpose in some dark, damp corner of Documentation/,
add some documentation to the default implementations.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200731192016.7484-2-valentin.schneider@arm.com
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-07-31
The following pull-request contains BPF updates for your *net* tree.
We've added 5 non-merge commits during the last 21 day(s) which contain
a total of 5 files changed, 126 insertions(+), 18 deletions(-).
The main changes are:
1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko.
2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no
btf_vmlinux is available, from Peilin Ye.
3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig.
4) Fix a cgroup sockopt verifier test by specifying expected attach type,
from Jean-Philippe Brucker.
Note that when net gets merged into net-next later on, there is a small
merge conflict in kernel/bpf/btf.c between commit 5b801dfb7feb ("bpf: Fix
NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree
and commit 138b9a0511c7 ("bpf: Remove btf_id helpers resolving") from the
net-next tree.
Resolve as follows: remove the old hunk with the __btf_resolve_helper_id()
function. Change the btf_resolve_helper_id() so it actually tests for a
NULL btf_vmlinux and bails out:
int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int arg)
{
int id;
if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux)
return -EINVAL;
id = fn->btf_id[arg];
if (!id || id > btf_vmlinux->nr_types)
return -EINVAL;
return id;
}
Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in
the loop with regards to merge conflict resolution).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'for-next/cpufeature', 'for-next/acpi', 'for-next/perf', 'for-next/timens', 'for-next/msi-iommu' and 'for-next/trivial' into for-next/core
* for-next/misc:
: Miscellaneous fixes and cleanups
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.
arm64: Reserve HWCAP2_MTE as (1 << 18)
arm64/entry: deduplicate SW PAN entry/exit routines
arm64: s/AMEVTYPE/AMEVTYPER
arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs
arm64: stacktrace: Move export for save_stack_trace_tsk()
smccc: Make constants available to assembly
arm64/mm: Redefine CONT_{PTE, PMD}_SHIFT
arm64/defconfig: Enable CONFIG_KEXEC_FILE
arm64: Document sysctls for emulated deprecated instructions
arm64/panic: Unify all three existing notifier blocks
arm64/module: Optimize module load time by optimizing PLT counting
* for-next/vmcoreinfo:
: Export the virtual and physical address sizes in vmcoreinfo
arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
* for-next/cpufeature:
: CPU feature handling cleanups
arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]
arm64/cpufeature: Replace all open bits shift encodings with macros
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register
* for-next/acpi:
: ACPI updates for arm64
arm64/acpi: disallow writeable AML opregion mapping for EFI code regions
arm64/acpi: disallow AML memory opregions to access kernel memory
* for-next/perf:
: perf updates for arm64
arm64: perf: Expose some new events via sysfs
tools headers UAPI: Update tools's copy of linux/perf_event.h
arm64: perf: Add cap_user_time_short
perf: Add perf_event_mmap_page::cap_user_time_short ABI
arm64: perf: Only advertise cap_user_time for arch_timer
arm64: perf: Implement correct cap_user_time
time/sched_clock: Use raw_read_seqcount_latch()
sched_clock: Expose struct clock_read_data
arm64: perf: Correct the event index in sysfs
perf/smmuv3: To simplify code for ioremap page in pmcg
* for-next/timens:
: Time namespace support for arm64
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
arm64/vdso: Add time namespace page
arm64/vdso: Zap vvar pages when switching to a time namespace
arm64/vdso: use the fault callback to map vvar pages
* for-next/msi-iommu:
: Make the MSI/IOMMU input/output ID translation PCI agnostic, augment the
: MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID bus-specific parameter
: and apply the resulting changes to the device ID space provided by the
: Freescale FSL bus
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
* for-next/trivial:
: Trivial fixes
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
|
|
Conflicts:
arch/arm/include/asm/percpu.h
As Stephen Rothwell noted, there's a conflict between this commit
in locking/core:
a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
and this fresh upstream commit:
aa54ea903abb ("ARM: percpu.h: fix build error")
a21ee6055c30 is a simpler solution to the dependency problem and doesn't
further increase header hell - so this conflict resolution effectively
reverts aa54ea903abb and uses the a21ee6055c30 solution.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
To improve the general usefulness of the IRQ state trace events with
KCSAN enabled, save and restore the trace information when entering and
exiting the KCSAN runtime as well as when generating a KCSAN report.
Without this, reporting the IRQ trace events (whether via a KCSAN report
or outside of KCSAN via a lockdep report) is rather useless due to
continuously being touched by KCSAN. This is because if KCSAN is
enabled, every instrumented memory access causes changes to IRQ trace
events (either by KCSAN disabling/enabling interrupts or taking
report_lock when generating a report).
Before "lockdep: Prepare for NMI IRQ state tracking", KCSAN avoided
touching the IRQ trace events via raw_local_irq_save/restore() and
lockdep_off/on().
Fixes: 248591f5d257 ("kcsan: Make KCSAN compatible with new IRQ state tracking")
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200729110916.3920464-2-elver@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Refactor the IRQ trace events fields, used for printing information
about the IRQ trace events, into a separate struct 'irqtrace_events'.
This improves readability by separating the information only used in
reporting, as well as enables (simplified) storing/restoring of
irqtrace_events snapshots.
No functional change intended.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200729110916.3920464-1-elver@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull the v5.9 RCU bits from Paul E. McKenney:
- Documentation updates
- Miscellaneous fixes
- kfree_rcu updates
- RCU tasks updates
- Read-side scalability tests
- SRCU updates
- Torture-test updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Fix HASH_OF_MAPS bug of not putting inner map pointer on bpf_map_elem_update()
operation. This is due to per-cpu extra_elems optimization, which bypassed
free_htab_elem() logic doing proper clean ups. Make sure that inner map is put
properly in optimized case as well.
Fixes: 8c290e60fa2a ("bpf: fix hashmap extra_elems logic")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729040913.2815687-1-andriin@fb.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fixes from Paul Moore:
"One small audit fix that you can hopefully merge before v5.8 is
released. Unfortunately it is a revert of a patch that went in during
the v5.7 window and we just recently started to see some bug reports
relating to that commit.
We are working on a proper fix, but I'm not yet clear on when that
will be ready and we need to fix the v5.7 kernels anyway, so in the
interest of time a revert seemed like the best solution right now"
* tag 'audit-pr-20200729' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
revert: 1320a4052ea1 ("audit: trigger accompanying records when no rules present")
|
|
This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.
Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.
In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state. For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.
Reported-by: Amit Klein <aksecurity@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
present")
Unfortunately the commit listed in the subject line above failed
to ensure that the task's audit_context was properly initialized/set
before enabling the "accompanying records". Depending on the
situation, the resulting audit_context could have invalid values in
some of it's fields which could cause a kernel panic/oops when the
task/syscall exists and the audit records are generated.
We will revisit the original patch, with the necessary fixes, in a
future kernel but right now we just want to fix the kernel panic
with the least amount of added risk.
Cc: stable@vger.kernel.org
Fixes: 1320a4052ea1 ("audit: trigger accompanying records when no rules present")
Reported-by: j2468h@googlemail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.
This is also referred to as 'the default boost value of RT tasks'.
See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks").
On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.
For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.
Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.
They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.
The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.
Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.
To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.
I anticipate this to be mostly in the form of modifying the init script
of a particular device.
To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.
Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.
With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.
[1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
|
|
The following splat was caught when setting uclamp value of a task:
BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:49
cpus_read_lock+0x68/0x130
static_key_enable+0x1c/0x38
__sched_setscheduler+0x900/0xad8
Fix by ensuring we enable the key outside of the critical section in
__sched_setscheduler()
Fixes: 46609ce22703 ("sched/uclamp: Protect uclamp fast path code with static key")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-4-qais.yousef@arm.com
|
|
In sched_update_tick_dependency() there's two calls that check
whether nohz_full is enabled: tick_nohz_full_cpu() does it
implicitly, while there's also an explicit call to tick_nohz_full_enabled().
Remove the duplicated, open coded check.
[ mingo: Amended the changelog. ]
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/1595935075-14223-1-git-send-email-linmiaohe@huawei.com
|
|
Since we already lock both kprobe_mutex and text_mutex in the optimizer,
text will not be changed and the module unloading will be stopped
inside kprobes_module_callback().
The mutex_lock() has originally been introduced to avoid conflict with text modification,
at that point we didn't hold text_mutex.
But after:
f1c6ece23729 ("kprobes: Fix potential deadlock in kprobe_optimizer()")
We started holding the text_mutex and don't need the modules mutex anyway.
So remove the module_mutex locking.
[ mingo: Amended the changelog. ]
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Link: https://lore.kernel.org/r/20200728163400.e00b09c594763349f99ce6cb@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Recently introduced irqchip flags lack the corresponding printouts in
debugfs. Add them.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/874kpvydxc.wl-maz@kernel.org
|
|
John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:
"It looks like what happens is that because the interrupts are not per-CPU
in the hardware, armpmu_request_irq() calls irq_force_affinity() while
the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
IRQF_NOBALANCING.
Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
irq_setup_affinity() which returns early because IRQF_PERCPU and
IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.
Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...
Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
|
|
Prior to commit:
859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking")
IRQ state tracking was disabled in NMIs due to nmi_enter()
doing lockdep_off() -- with the obvious requirement that NMI entry
call nmi_enter() before trace_hardirqs_off().
[ AFAICT, PowerPC and SH violate this order on their NMI entry ]
However, that commit explicitly changed lockdep_hardirqs_*() to ignore
lockdep_off() and breaks every architecture that has irq-tracing in
it's NMI entry that hasn't been fixed up (x86 being the only fixed one
at this point).
The reason for this change is that by ignoring lockdep_off() we can:
- get rid of 'current->lockdep_recursion' in lockdep_assert_irqs*()
which was going to to give header-recursion issues with the
seqlock rework.
- allow these lockdep_assert_*() macros to function in NMI context.
Restore the previous state of things and allow an architecture to
opt-in to the NMI IRQ tracking support, however instead of relying on
lockdep_off(), rely on in_nmi(), both are part of nmi_enter() and so
over-all entry ordering doesn't need to change.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200727124852.GK119549@hirez.programming.kicks-ass.net
|
|
s/postive/positive/
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Link: https://lore.kernel.org/r/20200724090531.GA14409@amd
[christian.brauner@ubuntu.com: tweak commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull uprobe fix from Ingo Molnar:
"Fix an interaction/regression between uprobes based shared library
tracing & GDB"
* tag 'perf-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix GDB regression
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The uclamp_mutex lock is initialized statically via DEFINE_MUTEX(),
it is unnecessary to initialize it runtime via mutex_init().
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20200725085629.98292-1-miaoqinglang@huawei.com
|
|
GDB regression
If a tracee is uprobed and it hits int3 inserted by debugger, handle_swbp()
does send_sig(SIGTRAP, current, 0) which means si_code == SI_USER. This used
to work when this code was written, but then GDB started to validate si_code
and now it simply can't use breakpoints if the tracee has an active uprobe:
# cat test.c
void unused_func(void)
{
}
int main(void)
{
return 0;
}
# gcc -g test.c -o test
# perf probe -x ./test -a unused_func
# perf record -e probe_test:unused_func gdb ./test -ex run
GNU gdb (GDB) 10.0.50.20200714-git
...
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7ddf909 in dl_main () from /lib64/ld-linux-x86-64.so.2
(gdb)
The tracee hits the internal breakpoint inserted by GDB to monitor shared
library events but GDB misinterprets this SIGTRAP and reports a signal.
Change handle_swbp() to use force_sig(SIGTRAP), this matches do_int3_user()
and fixes the problem.
This is the minimal fix for -stable, arch/x86/kernel/uprobes.c is equally
wrong; it should use send_sigtrap(TRAP_TRACE) instead of send_sig(SIGTRAP),
but this doesn't confuse GDB and needs another x86-specific patch.
Reported-by: Aaron Merey <amerey@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200723154420.GA32043@redhat.com
|
|
Since the default_wake_function() passes its flags onto
try_to_wake_up(), warn if those flags collide with internal values.
Given that the supplied flags are garbage, no repair can be done but at
least alert the user to the damage they are causing.
In the belief that these errors should be picked up during testing, the
warning is only compiled in under CONFIG_SCHED_DEBUG.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20200723201042.18861-1-chris@chris-wilson.co.uk
|
|
Only its reorder field is actually used now, so remove the struct and
embed @reorder directly in parallel_data.
No functional change, just a cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
There's no reason to have two interfaces when there's only one caller.
Removing _possible saves text and simplifies future changes.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
A padata instance has effective cpumasks that store the user-supplied
masks ANDed with the online mask, but this middleman is unnecessary.
parallel_data keeps the same information around. Removing this saves
text and code churn in future changes.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
pd_setup_cpumasks() has only one caller. Move its contents inline to
prepare for the next cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
padata_stop() has two callers and is unnecessary in both cases. When
pcrypt calls it before padata_free(), it's being unloaded so there are
no outstanding padata jobs[0]. When __padata_free() calls it, it's
either along the same path or else pcrypt initialization failed, which
of course means there are also no outstanding jobs.
Removing it simplifies padata and saves text.
[0] https://lore.kernel.org/linux-crypto/20191119225017.mjrak2fwa5vccazl@gondor.apana.org.au/
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
padata_start() is only used right after pcrypt allocates an instance
with all possible CPUs, when PADATA_INVALID can't happen, so there's no
need for a separate "start" step. It can be done during allocation to
save text, make using padata easier, and avoid unneeded calls in the
future.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The following commit:
14533a16c46d ("thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code")
moved the definition of arch_set_thermal_pressure() to sched/core.c, but
kept its declaration in linux/arch_topology.h. When building e.g. an x86
kernel with CONFIG_SCHED_THERMAL_PRESSURE=y, cpufreq_cooling.c ends up
getting the declaration of arch_set_thermal_pressure() from
include/linux/arch_topology.h, which is somewhat awkward.
On top of this, sched/core.c unconditionally defines
o The thermal_pressure percpu variable
o arch_set_thermal_pressure()
while arch_scale_thermal_pressure() does nothing unless redefined by the
architecture.
arch_*() functions are meant to be defined by architectures, so revert the
aforementioned commit and re-implement it in a way that keeps
arch_set_thermal_pressure() architecture-definable, and doesn't define the
thermal pressure percpu variable for kernels that don't need
it (CONFIG_SCHED_THERMAL_PRESSURE=n).
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200712165917.9168-2-valentin.schneider@arm.com
|
|
The get_option() maybe return 0, it means that the nr_cpus is
not initialized. Then we will use the stale nr_cpus to initialize
the nr_cpu_ids. So fix it.
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716070457.53255-1-songmuchun@bytedance.com
|
|
idle_cpus are equal
In slow path, when selecting idlest group, if both groups have type
group_has_spare, only idle_cpus count gets compared.
As a result, if multiple tasks are created in a tight loop,
and go back to sleep immediately
(while waiting for all tasks to be created),
they may be scheduled on the same core, because CPU is back to idle
when the new fork happen.
For example:
sudo perf record -e sched:sched_wakeup_new -- \
sysbench threads --threads=4 run
...
total number of events: 61582
...
sudo perf script
sysbench 129378 [006] 74586.633466: sched:sched_wakeup_new:
sysbench:129380 [120] success=1 CPU:007
sysbench 129378 [006] 74586.634718: sched:sched_wakeup_new:
sysbench:129381 [120] success=1 CPU:007
sysbench 129378 [006] 74586.635957: sched:sched_wakeup_new:
sysbench:129382 [120] success=1 CPU:007
sysbench 129378 [006] 74586.637183: sched:sched_wakeup_new:
sysbench:129383 [120] success=1 CPU:007
This may have negative impact on performance for workloads with frequent
creation of multiple threads.
In this patch we are using group_util to select idlest group if both groups
have equal number of idle_cpus. Comparing the number of idle cpu is
not enough in this case, because the newly forked thread sleeps
immediately and before we select the cpu for the next one.
This is shown in the trace where the same CPU7 is selected for
all wakeup_new events.
That's why, looking at utilization when there is the same number of
CPU is a good way to see where the previous task was placed. Using
nr_running doesn't solve the problem because the newly forked task is not
running and the cpu would not have been idle in this case and an idle
CPU would have been selected instead.
With this patch newly created tasks would be better distributed.
With this patch:
sudo perf record -e sched:sched_wakeup_new -- \
sysbench threads --threads=4 run
...
total number of events: 74401
...
sudo perf script
sysbench 129455 [006] 75232.853257: sched:sched_wakeup_new:
sysbench:129457 [120] success=1 CPU:008
sysbench 129455 [006] 75232.854489: sched:sched_wakeup_new:
sysbench:129458 [120] success=1 CPU:009
sysbench 129455 [006] 75232.855732: sched:sched_wakeup_new:
sysbench:129459 [120] success=1 CPU:010
sysbench 129455 [006] 75232.856980: sched:sched_wakeup_new:
sysbench:129460 [120] success=1 CPU:011
We tested this patch with following benchmarks:
master: 'commit b3a9e3b9622a ("Linux 5.8-rc1")'
100 iterations of: perf bench -f simple futex wake -s -t 128 -w 1
Lower result is better
| | BASELINE | +PATCH | DELTA (%) |
|---------|------------|----------|-------------|
| mean | 0.33 | 0.313 | +5.152 |
| std (%) | 10.433 | 7.563 | |
100 iterations of: sysbench threads --threads=8 run
Higher result is better
| | BASELINE | +PATCH | DELTA (%) |
|---------|------------|----------|-------------|
| mean | 5235.02 | 5863.73 | +12.01 |
| std (%) | 8.166 | 10.265 | |
100 iterations of: sysbench mutex --mutex-num=1 --threads=8 run
Lower result is better
| | BASELINE | +PATCH | DELTA (%) |
|---------|------------|----------|-------------|
| mean | 0.413 | 0.404 | +2.179 |
| std (%) | 3.791 | 1.816 | |
Signed-off-by: Peter Puhov <peter.puhov@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200714125941.4174-1-peter.puhov@linaro.org
|
|
The "ticks" parameter was added in commit 0f004f5a696a ("sched: Cure more
NO_HZ load average woes") since calc_global_nohz() was called and needed
the "ticks" argument.
But in commit c308b56b5398 ("sched: Fix nohz load accounting -- again!")
it became unused as the function calc_global_nohz() dropped using "ticks".
Fixes: c308b56b5398 ("sched: Fix nohz load accounting -- again!")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593628458-32290-1-git-send-email-paul.gortmaker@windriver.com
|
|
Dave hit the problem fixed by commit:
b6e13e85829f ("sched/core: Fix ttwu() race")
and failed to understand much of the code involved. Per his request a
few comments to (hopefully) clarify things.
Requested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200702125211.GQ4800@hirez.programming.kicks-ass.net
|