aboutsummaryrefslogtreecommitdiff
path: root/ipc/util.h
AgeCommit message (Collapse)Author
2017-07-12ipc/util.h: update documentation for ipc_getref() and ipc_putref()Manfred Spraul
Now that ipc_rcu_alloc() and ipc_rcu_free() are removed, document when it is valid to use ipc_getref() and ipc_putref(). Link: http://lkml.kernel.org/r/20170525185107.12869-21-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12ipc/util: drop ipc_rcu_alloc()Kees Cook
No callers remain for ipc_rcu_alloc(). Drop the function. [manfred@colorfullife.com: Rediff because the memset was temporarily inside ipc_rcu_free()] Link: http://lkml.kernel.org/r/20170525185107.12869-13-manfred@colorfullife.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12ipc/util: drop ipc_rcu_free()Kees Cook
There are no more callers of ipc_rcu_free(), so remove it. Link: http://lkml.kernel.org/r/20170525185107.12869-9-manfred@colorfullife.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12ipc: drop non-RCU allocationKees Cook
The only users of ipc_alloc() were ipc_rcu_alloc() and the on-heap sem_io fall-back memory. Better to just open-code these to make things easier to read. [manfred@colorfullife.com: Rediff due to inclusion of memset() into ipc_rcu_alloc()] Link: http://lkml.kernel.org/r/20170525185107.12869-5-manfred@colorfullife.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12ipc: merge ipc_rcu and kern_ipc_permManfred Spraul
ipc has two management structures that exist for every id: - struct kern_ipc_perm, it contains e.g. the permissions. - struct ipc_rcu, it contains the rcu head for rcu handling and the refcount. The patch merges both structures. As a bonus, we may save one cacheline, because both structures are cacheline aligned. In addition, it reduces the number of casts, instead most codepaths can use container_of. To simplify code, the ipc_rcu_alloc initializes the allocation to 0. [manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()] Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-17ipc: Remove unused declaration of recompute_msgmniEric W. Biederman
The function recompute_msgmni was removed a while ago but it is still declared in a header file remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-01-22tree wide: use kvfree() than conditional kfree()/vfree()Tetsuo Handa
There are many locations that do if (memory_was_allocated_by_vmalloc) vfree(ptr); else kfree(ptr); but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory using is_vmalloc_addr(). Unless callers have special reasons, we can replace this branch with kvfree(). Please check and reply if you found problems. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: David Rientjes <rientjes@google.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Boris Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30ipc: rename ipc_obtain_objectDavidlohr Bueso
... to ipc_obtain_object_idr, which is more meaningful and makes the code slightly easier to follow. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc, kernel: clear whitespacePaul McQuade
trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc: constify ipc_opsMathias Krause
There is no need to recreate the very same ipc_ops structure on every kernel entry for msgget/semget/shmget. Just declare it static and be done with it. While at it, constify it as we don't modify the structure at runtime. Found in the PaX patch, written by the PaX Team. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: delete seq_max field in struct ipc_idsDavidlohr Bueso
This field is only used to reset the ids seq number if it exceeds the smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be moved out of the structure and into its own macro. Since each ipc_namespace contains a table of 3 pointers to struct ipc_ids we can save space in instruction text: text data bss dec hex filename 56232 2348 24 58604 e4ec ipc/built-in.o 56216 2348 24 58588 e4dc ipc/built-in.o-after Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Jonathan Gonzalez <jgonzalez@linets.cl> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: whitespace cleanupManfred Spraul
The ipc code does not adhere the typical linux coding style. This patch fixes lots of simple whitespace errors. - mostly autogenerated by scripts/checkpatch.pl -f --fix \ --types=pointer_location,spacing,space_before_tab - one manual fixup (keep structure members tab-aligned) - removal of additional space_before_tab that were not found by --fix Tested with some of my msg and sem test apps. Andrew: Could you include it in -mm and move it towards Linus' tree? Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Suggested-by: Li Bin <huawei.libin@huawei.com> Cc: Joe Perches <joe@perches.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: change kern_ipc_perm.deleted type to boolRafael Aquini
struct kern_ipc_perm.deleted is meant to be used as a boolean toggle, and the changes introduced by this patch are just to make the case explicit. Signed-off-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Greg Thelen <gthelen@google.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: introduce ipc_valid_object() helper to sort out IPC_RMID racesRafael Aquini
After the locking semantics for the SysV IPC API got improved, a couple of IPC_RMID race windows were opened because we ended up dropping the 'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The spotted races got sorted out by re-introducing the old test within the racy critical sections. This patch introduces ipc_valid_object() to consolidate the way we cope with IPC_RMID races by using the same abstraction across the API implementation. Signed-off-by: Rafael Aquini <aquini@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Greg Thelen <gthelen@google.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ipc, msg: fix message length check for negative valuesMathias Krause
On 64 bit systems the test for negative message sizes is bogus as the size, which may be positive when evaluated as a long, will get truncated to an int when passed to load_msg(). So a long might very well contain a positive value but when truncated to an int it would become negative. That in combination with a small negative value of msg_ctlmax (which will be promoted to an unsigned type for the comparison against msgsz, making it a big positive value and therefore make it pass the check) will lead to two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too small buffer as the addition of alen is effectively a subtraction. 2/ The copy_from_user() call in load_msg() will first overflow the buffer with userland data and then, when the userland access generates an access violation, the fixup handler copy_user_handle_tail() will try to fill the remainder with zeros -- roughly 4GB. That almost instantly results in a system crash or reset. ,-[ Reproducer (needs to be run as root) ]-- | #include <sys/stat.h> | #include <sys/msg.h> | #include <unistd.h> | #include <fcntl.h> | | int main(void) { | long msg = 1; | int fd; | | fd = open("/proc/sys/kernel/msgmax", O_WRONLY); | write(fd, "-1", 2); | close(fd); | | msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT); | | return 0; | } '--- Fix the issue by preventing msgsz from getting truncated by consistently using size_t for the message length. This way the size checks in do_msgsnd() could still be passed with a negative value for msg_ctlmax but we would fail on the buffer allocation in that case and error out. Also change the type of m_ts from int to size_t to avoid similar nastiness in other code paths -- it is used in similar constructs, i.e. signed vs. unsigned checks. It should never become negative under normal circumstances, though. Setting msg_ctlmax to a negative value is an odd configuration and should be prevented. As that might break existing userland, it will be handled in a separate commit so it could easily be reverted and reworked without reintroducing the above described bug. Hardening mechanisms for user copy operations would have catched that bug early -- e.g. checking slab object sizes on user copy operations as the usercopy feature of the PaX patch does. Or, for that matter, detect the long vs. int sign change due to truncation, as the size overflow plugin of the very same patch does. [akpm@linux-foundation.org: fix i386 min() warnings] Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> [ v2.3.27+ -- yes, that old ;) ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24ipc: fix race with LSMsDavidlohr Bueso
Currently, IPC mechanisms do security and auditing related checks under RCU. However, since security modules can free the security structure, for example, through selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure is freed before other tasks are done with it, creating a use-after-free condition. Manfred illustrates this nicely, for instance with shared mem and selinux: -> do_shmat calls rcu_read_lock() -> do_shmat calls shm_object_check(). Checks that the object is still valid - but doesn't acquire any locks. Then it returns. -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat) -> selinux_shm_shmat calls ipc_has_perm() -> ipc_has_perm accesses ipc_perms->security shm_close() -> shm_close acquires rw_mutex & shm_lock -> shm_close calls shm_destroy -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security) -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm) -> ipc_free_security calls kfree(ipc_perms->security) This patch delays the freeing of the security structures after all RCU readers are done. Furthermore it aligns the security life cycle with that of the rest of IPC - freeing them based on the reference counter. For situations where we need not free security, the current behavior is kept. Linus states: "... the old behavior was suspect for another reason too: having the security blob go away from under a user sounds like it could cause various other problems anyway, so I think the old code was at least _prone_ to bugs even if it didn't have catastrophic behavior." I have tested this patch with IPC testcases from LTP on both my quad-core laptop and on a 64 core NUMA server. In both cases selinux is enabled, and tests pass for both voluntary and forced preemption models. While the mentioned races are theoretical (at least no one as reported them), I wanted to make sure that this new logic doesn't break anything we weren't aware of. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11ipc: drop ipc_lock_checkDavidlohr Bueso
No remaining users, we now use ipc_obtain_object_check(). Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11ipc: drop ipc_lock_by_ptrDavidlohr Bueso
After previous cleanups and optimizations, this function is no longer heavily used and we don't have a good reason to keep it. Update the few remaining callers and get rid of it. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11ipc: rename ids->rw_mutexDavidlohr Bueso
Since in some situations the lock can be shared for readers, we shouldn't be calling it a mutex, rename it to rwsem. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11ipc: drop ipcctl_pre_downDavidlohr Bueso
Now that sem, msgque and shm, through *_down(), all use the lockless variant of ipcctl_pre_down(), go ahead and delete it. [akpm@linux-foundation.org: fix function name in kerneldoc, cleanups] Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09ipc: close open coded spin lock callsDavidlohr Bueso
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09ipc: introduce ipc object locking helpersDavidlohr Bueso
Simple helpers around the (kern_ipc_perm *)->lock spinlock. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc,sem: fine grained locking for semtimedopRik van Riel
Introduce finer grained locking for semtimedop, to handle the common case of a program wanting to manipulate one semaphore from an array with multiple semaphores. If the call is a semop manipulating just one semaphore in an array with multiple semaphores, only take the lock for that semaphore itself. If the call needs to manipulate multiple semaphores, or another caller is in a transaction that manipulates multiple semaphores, the sem_array lock is taken, as well as all the locks for the individual semaphores. On a 24 CPU system, performance numbers with the semop-multi test with N threads and N semaphores, look like this: vanilla Davidlohr's Davidlohr's + Davidlohr's + threads patches rwlock patches v3 patches 10 610652 726325 1783589 2142206 20 341570 365699 1520453 1977878 30 288102 307037 1498167 2037995 40 290714 305955 1612665 2256484 50 288620 312890 1733453 2650292 60 289987 306043 1649360 2388008 70 291298 306347 1723167 2717486 80 290948 305662 1729545 2763582 90 290996 306680 1736021 2757524 100 292243 306700 1773700 3059159 [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma] [davidlohr.bueso@hp.com: make refcounter atomic] Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Jason Low <jason.low2@hp.com> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Emmanuel Benisty <benisty.e@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc,sem: do not hold ipc lock more than necessaryDavidlohr Bueso
Instead of holding the ipc lock for permissions and security checks, among others, only acquire it when necessary. Some numbers.... 1) With Rik's semop-multi.c microbenchmark we can see the following results: Baseline (3.9-rc1): cpus 4, threads: 256, semaphores: 128, test duration: 30 secs total operations: 151452270, ops/sec 5048409 + 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock + 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop + 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags + 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit + 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string + 1.86% a.out [kernel.kallsyms] [k] ipc_lock With this patchset: cpus 4, threads: 256, semaphores: 128, test duration: 30 secs total operations: 273156400, ops/sec 9105213 + 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock + 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop + 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21 + 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags + 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit + 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check 2) While on an Oracle swingbench DSS (data mining) workload the improvements are not as exciting as with Rik's benchmark, we can see some positive numbers. For an 8 socket machine the following are the percentages of %sys time incurred in the ipc lock: Baseline (3.9-rc1): 100 swingbench users: 8,74% 400 swingbench users: 21,86% 800 swingbench users: 84,35% With this patchset: 100 swingbench users: 8,11% 400 swingbench users: 19,93% 800 swingbench users: 77,69% [riel@redhat.com: fix two locking bugs] [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Acked-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: introduce lockless pre_down ipcctlDavidlohr Bueso
Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and check permissions, mostly for IPC_RMID and IPC_SET commands. Introduce ipcctl_pre_down_nolock(), a lockless version of this function. The locking version is retained, yet modified to call the nolock version without affecting its semantics, thus transparent to all ipc callers. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: introduce obtaining a lockless ipc objectDavidlohr Bueso
Through ipc_lock() and therefore ipc_lock_check() we currently return the locked ipc object. This is not necessary for all situations and can, therefore, cause unnecessary ipc lock contention. Introduce analogous ipc_obtain_object() and ipc_obtain_object_check() functions that only lookup and return the ipc object. Both these functions must be called within the RCU read critical section. [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()] Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Acked-by: Michel Lespinasse <walken@google.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: remove bogus lock comment for ipc_checkidDavidlohr Bueso
This series makes the sysv semaphore code more scalable, by reducing the time the semaphore lock is held, and making the locking more scalable for semaphore arrays with multiple semaphores. The first four patches were written by Davidlohr Buesso, and reduce the hold time of the semaphore lock. The last three patches change the sysv semaphore code locking to be more fine grained, providing a performance boost when multiple semaphores in a semaphore array are being manipulated simultaneously. On a 24 CPU system, performance numbers with the semop-multi test with N threads and N semaphores, look like this: vanilla Davidlohr's Davidlohr's + Davidlohr's + threads patches rwlock patches v3 patches 10 610652 726325 1783589 2142206 20 341570 365699 1520453 1977878 30 288102 307037 1498167 2037995 40 290714 305955 1612665 2256484 50 288620 312890 1733453 2650292 60 289987 306043 1649360 2388008 70 291298 306347 1723167 2717486 80 290948 305662 1729545 2763582 90 290996 306680 1736021 2757524 100 292243 306700 1773700 3059159 This patch: There is no reason to be holding the ipc lock while reading ipcp->seq, hence remove misleading comment. Also simplify the return value for the function. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: introduce message queue copy featureStanislav Kinsbursky
This patch is required for checkpoint/restore in userspace. c/r requires some way to get all pending IPC messages without deleting them from the queue (checkpoint can fail and in this case tasks will be resumed, so queue have to be valid). To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was introduced. If this flag was specified, then mtype is interpreted as number of the message to copy. If MSG_COPY is set, then kernel will allocate dummy message with passed size, and then use new copy_msg() helper function to copy desired message (instead of unlinking it from the queue). Notes: 1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE is not set. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: add sysctl to specify desired next object idStanislav Kinsbursky
Add 3 new variables and sysctls to tune them (by one "next_id" variable for messages, semaphores and shared memory respectively). This variable can be used to set desired id for next allocated IPC object. By default it's equal to -1 and old behaviour is preserved. If this variable is non-negative, then desired idr will be extracted from it and used as a start value to search for free IDR slot. Notes: 1) this patch doesn't guarantee that the new object will have desired id. So it's up to user space how to handle new object with wrong id. 2) After a sucessful id allocation attempt, "next_id" will be set back to -1 (if it was non-negative). [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-06userns: Convert ipc to use kuid and kgid where appropriateEric W. Biederman
- Store the ipc owner and creator with a kuid - Store the ipc group and the crators group with a kgid. - Add error handling to ipc_update_perms, allowing it to fail if the uids and gids can not be converted to kuids or kgids. - Modify the proc files to display the ipc creator and owner in the user namespace of the opener of the proc file. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-07-30ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSIONWill Deacon
Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23userns: user namespaces: convert several capable() callsSerge E. Hallyn
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(), because the resource comes from current's own ipc namespace. setuid/setgid are to uids in own namespace, so again checks can be against current_user_ns(). Changelog: Jan 11: Use task_ns_capable() in place of sched_capable(). Jan 11: Use nsown_capable() as suggested by Bastian Blank. Jan 11: Clarify (hopefully) some logic in futex and sched.c Feb 15: use ns_capable for ipc, not nsown_capable Feb 23: let copy_ipcs handle setting ipc_ns->user_ns Feb 23: pass ns down rather than taking it from current [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21ipc: unbreak 32-bit shmctl/semctl/msgctlJohannes Weiner
31a985f "ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.h" would choose the implementation of ipc_parse_version() based on a symbol defined in <asm/unistd.h>. But it failed to also include this header and thus broke IPC_64-passing 32-bit userspace because the flag wasn't masked out properly anymore and the command not understood. Include <linux/unistd.h> to give the architecture a chance to ask for the no-no-op ipc_parse_version(). Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18ipcns: move free_ipcs() protoAlexey Dobriyan
Function is really private to ipc/ and avoid struct kern_ipc_perm forward declaration. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.hArnd Bergmann
The definition of ipc_parse_version depends on __ARCH_WANT_IPC_PARSE_VERSION, but the header file declares it conditionally based on the architecture. Use the macro consistently to make it easier to add new architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07namespaces: ipc namespaces: implement support for posix msqueuesSerge E. Hallyn
Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespaceSerge E. Hallyn
Move mqueue vfsmount plus a few tunables into the ipc_namespace struct. The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the posix message queue namespaces and the SYSV ipc namespaces. The sysctl code will be fixed separately in patch 3. After just this patch, making a change to posix mqueue tunables always changes the values in the initial ipc namespace. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ipc: get rid of ipc_lock_down()Nadia Derbey
Remove the ipc_lock_down() routines: they used to call idr_find() locklessly (given that the ipc ids lock was already held), so they are not needed anymore. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: add definitions of USHORT_MAX and othersZhang, Yanmin
Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub implementation might also use it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Pierre Peiffer" <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC: consolidate all xxxctl_down() functionsPierre Peiffer
semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set of commands for each kind of IPC. They all start to do the same job (they retrieve the ipc and do some permission checks) before handling the commands on their own. This patch proposes to consolidate this by moving these same pieces of code into one common function called ipcctl_pre_down(). It simplifies a little these xxxctl_down() functions and increases a little the maintainability. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC: introduce ipc_update_perm()Pierre Peiffer
The IPC_SET command performs the same permission setting for all IPCs. This patch introduces a common ipc_update_perm() function to update these permissions and makes use of it for all IPCs. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: recompute msgmni on memory add / removeNadia Derbey
Introduce the registration of a callback routine that recomputes msg_ctlmni upon memory add / remove. A single notifier block is registered in the hotplug memory chain for all the ipc namespaces. Since the ipc namespaces are not linked together, they have their own notification chain: one notifier_block is defined per ipc namespace. Each time an ipc namespace is created (removed) it registers (unregisters) its notifier block in (from) the ipcns chain. The callback routine registered in the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event. Each callback routine registered in the ipcns namespace, in turn, recomputes msgmni for the owning namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08IPC: make struct ipc_ids static in ipc_namespacePierre Peiffer
Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids' are dynamically allocated for each icp_namespace as the ipc_namespace itself (for the init namespace, they are initialized with pointers to static variables instead) It is so for historical reason: in fact, before the use of idr to store the ipcs, the ipcs were stored in tables of variable length, depending of the maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed size. As they are allocated in any cases for each new ipc_namespace, there is no gain of memory in having them allocated separately of the struct ipc_namespace. This patch proposes to make this table static in the struct ipc_namespace. Thus, we can allocate all in once and get rid of all the code needed to allocate and free these ipc_ids separately. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08ipc: uninline some code from util.hPavel Emelyanov
ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be inline. Besides, they give no optimization being inline as they perform calls inside in any case. Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC internal API. $ ./scripts/bloat-o-meter vmlinux-orig vmlinux add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499) function old new delta ipcget - 392 +392 ipc_lock_check_down - 49 +49 ipc_lock_check - 49 +49 sys_semget 119 105 -14 sys_shmget 108 86 -22 sys_msgget 100 78 -22 do_msgsnd 665 631 -34 do_msgrcv 680 644 -36 do_shmat 771 733 -38 sys_msgctl 1302 1229 -73 ipcget_new 80 - -80 sys_semtimedop 1534 1452 -82 sys_semctl 2034 1922 -112 sys_shmctl 1919 1765 -154 ipcget_public 322 - -322 The ipcget() growth is the result of gcc inlining of currently static ipcget_new/_public. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: move the IPC namespace under IPC_NS optionPavel Emelyanov
Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19ipc: remove unneeded parametersNadia Derbey
Remvoe the unneeded parameters from ipc_checkid() and ipc_buildid() interfaces. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19fix idr_find() lockingNadia Derbey
This is a patch that fixes the way idr_find() used to be called in ipc_lock(): in all the paths that don't imply an update of the ipcs idr, it was called without the idr tree being locked. The changes are: . in ipc_ids, the mutex has been changed into a reader/writer semaphore. . ipc_lock() now takes the mutex as a reader during the idr_find(). . a new routine ipc_lock_down() has been defined: it doesn't take the mutex, assuming that it is being held by the caller. This is the routine that is now called in all the update paths. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19ipc: fix wrong commentsNadia Derbey
This patch fixes the wrong / obsolete comments in the ipc code. Also adds a missing lock around ipc_get_maxid() in shm_get_stat(). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19ipc: inline ipc_buildid()Nadia Derbey
This is a trivial patch that changes the ipc_buildid() routine into a static inline. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19ipc: introduce the ipcid_to_idx macroNadia Derbey
This is a trivial patch that changes all the (id % SEQ_MULTIPLIER) into a call to the ipcid_to_idx(id) macro. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>