diff options
author | Linus Torvalds | 2022-03-24 10:16:00 -0700 |
---|---|---|
committer | Linus Torvalds | 2022-03-24 10:16:00 -0700 |
commit | cd4699c5fd66b00211f4709b9957bfd7b0a02ddc (patch) | |
tree | be9c279ca9597a9da17e649a521f60591fe4d103 /include | |
parent | 2e2d4650b34ffe0a39f70acc9429a58d94e39236 (diff) | |
parent | 18c91bb2d87268d23868bf13508f5bc9cf04e89a (diff) |
Merge tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull tasklist_lock optimizations from Eric Biederman:
"prlimit and getpriority tasklist_lock optimizations
The tasklist_lock popped up as a scalability bottleneck on some
testing workloads. The readlocks in do_prlimit and set/getpriority are
not necessary in all cases.
Based on a cycles profile, it looked like ~87% of the time was spent
in the kernel, ~42% of which was just trying to get *some* spinlock
(queued_spin_lock_slowpath, not necessarily the tasklist_lock).
The big offenders (with rough percentages in cycles of the overall
trace):
- do_wait 11%
- setpriority 8% (done previously in commit 7f8ca0edfe07)
- kill 8%
- do_exit 5%
- clone 3%
- prlimit64 2% (this patchset)
- getrlimit 1% (this patchset)
I can't easily test this patchset on the original workload for various
reasons. Instead, I used the microbenchmark below to at least verify
there was some improvement. This patchset had a 28% speedup (12% from
baseline to set/getprio, then another 14% for prlimit).
This series used to do the setpriority case, but an almost identical
change was merged as commit 7f8ca0edfe07 ("kernel/sys.c: only take
tasklist_lock for get/setpriority(PRIO_PGRP)") so that has been
dropped from here.
One interesting thing is that my libc's getrlimit() was calling
prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64
had no effect - it essentially optimized the older syscalls only. I
didn't do that in this patchset, but figured I'd mention it since it
was an option from the previous patch's discussion"
micobenchmark.c:
---------------
int main(int argc, char **argv)
{
pid_t child;
struct rlimit rlim[1];
fork(); fork(); fork(); fork(); fork(); fork();
for (int i = 0; i < 5000; i++) {
child = fork();
if (child < 0)
exit(1);
if (child > 0) {
usleep(1000);
kill(child, SIGTERM);
waitpid(child, NULL, 0);
} else {
for (;;) {
setpriority(PRIO_PROCESS, 0,
getpriority(PRIO_PROCESS, 0));
getrlimit(RLIMIT_CPU, rlim);
}
}
}
return 0;
}
Link: https://lore.kernel.org/lkml/20211213220401.1039578-1-brho@google.com/ [v1]
Link: https://lore.kernel.org/lkml/20220105212828.197013-1-brho@google.com/ [v2]
Link: https://lore.kernel.org/lkml/20220106172041.522167-1-brho@google.com/ [v3]
* tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
prlimit: do not grab the tasklist_lock
prlimit: make do_prlimit() static
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/posix-timers.h | 2 | ||||
-rw-r--r-- | include/linux/resource.h | 2 |
2 files changed, 1 insertions, 3 deletions
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 5bbcd280bfd2..9cf126c3b27f 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -253,7 +253,7 @@ void posix_cpu_timers_exit_group(struct task_struct *task); void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx, u64 *newval, u64 *oldval); -void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); +int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); void posixtimer_rearm(struct kernel_siginfo *info); #endif diff --git a/include/linux/resource.h b/include/linux/resource.h index bdf491cbcab7..4fdbc0c3f315 100644 --- a/include/linux/resource.h +++ b/include/linux/resource.h @@ -8,7 +8,5 @@ struct task_struct; void getrusage(struct task_struct *p, int who, struct rusage *ru); -int do_prlimit(struct task_struct *tsk, unsigned int resource, - struct rlimit *new_rlim, struct rlimit *old_rlim); #endif |