aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2011-01-13thp: khugepagedAndrea Arcangeli
Add khugepaged to relocate fragmented pages into hugepages if new hugepages become available. (this is indipendent of the defrag logic that will have to make new hugepages available) The fundamental reason why khugepaged is unavoidable, is that some memory can be fragmented and not everything can be relocated. So when a virtual machine quits and releases gigabytes of hugepages, we want to use those freely available hugepages to create huge-pmd in the other virtual machines that may be running on fragmented memory, to maximize the CPU efficiency at all times. The scan is slow, it takes nearly zero cpu time, except when it copies data (in which case it means we definitely want to pay for that cpu time) so it seems a good tradeoff. In addition to the hugepages being released by other process releasing memory, we have the strong suspicion that the performance impact of potentially defragmenting hugepages during or before each page fault could lead to more performance inconsistency than allocating small pages at first and having them collapsed into large pages later... if they prove themselfs to be long lived mappings (khugepaged scan is slow so short lived mappings have low probability to run into khugepaged if compared to long lived mappings). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13thp: add pmd_huge_pte to mm_structAndrea Arcangeli
This increase the size of the mm struct a bit but it is needed to preallocate one pte for each hugepage so that split_huge_page will not require a fail path. Guarantee of success is a fundamental property of split_huge_page to avoid decrasing swapping reliability and to avoid adding -ENOMEM fail paths that would otherwise force the hugepage-unaware VM code to learn rolling back in the middle of its pte mangling operations (if something we need it to learn handling pmd_trans_huge natively rather being capable of rollback). When split_huge_page runs a pte is needed to succeed the split, to map the newly splitted regular pages with a regular pte. This way all existing VM code remains backwards compatible by just adding a split_huge_page* one liner. The memory waste of those preallocated ptes is negligible and so it is worth it. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13thp: update futex compound knowledgeAndrea Arcangeli
Futex code is smarter than most other gup_fast O_DIRECT code and knows about the compound internals. However now doing a put_page(head_page) will not release the pin on the tail page taken by gup-fast, leading to all sort of refcounting bugchecks. Getting a stable head_page is a little tricky. page_head = page is there because if this is not a tail page it's also the page_head. Only in case this is a tail page, compound_head is called, otherwise it's guaranteed unnecessary. And if it's a tail page compound_head has to run atomically inside irq disabled section __get_user_pages_fast before returning. Otherwise ->first_page won't be a stable pointer. Disableing irq before __get_user_page_fast and releasing irq after running compound_head is needed because if __get_user_page_fast returns == 1, it means the huge pmd is established and cannot go away from under us. pmdp_splitting_flush_notify in __split_huge_page_splitting will have to wait for local_irq_enable before the IPI delivery can return. This means __split_huge_page_refcount can't be running from under us, and in turn when we run compound_head(page) we're not reading a dangling pointer from tailpage->first_page. Then after we get to stable head page, we are always safe to call compound_lock and after taking the compound lock on head page we can finally re-check if the page returned by gup-fast is still a tail page. in which case we're set and we didn't need to split the hugepage in order to take a futex on it. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj downMandeep Singh Baines
We'd like to be able to oom_score_adj a process up/down as it enters/leaves the foreground. Currently, it is not possible to oom_adj down without CAP_SYS_RESOURCE. This patch allows a task to decrease its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to or its inherited value at fork. Assuming the thread that has forked it has oom_score_adj of 0, each process could decrease it back from 0 upon activation unless a CAP_SYS_RESOURCE thread elevated it to something higher. Alternative considered: * a setuid binary * a daemon with CAP_SYS_RESOURCE Since you don't wan't all processes to be able to reduce their oom_adj, a setuid or daemon implementation would be complex. The alternatives also have much higher overhead. This patch updated from original patch based on feedback from David Rientjes. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13sched: remove long deprecated CLONE_STOPPED flagDave Jones
This warning was added in commit bdff746a3915 ("clone: prepare to recycle CLONE_STOPPED") three years ago. 2.6.26 came and went. As far as I know, no-one is actually using CLONE_STOPPED. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13irq: use per_cpu kstat_irqsEric Dumazet
Use modern per_cpu API to increment {soft|hard}irq counters, and use per_cpu allocation for (struct irq_desc)->kstats_irq instead of an array. This gives better SMP/NUMA locality and saves few instructions per irq. With small nr_cpuids values (8 for example), kstats_irq was a small array (less than L1_CACHE_BYTES), potentially source of false sharing. In the !CONFIG_SPARSE_IRQ case, remove the huge, NUMA/cache unfriendly kstat_irqs_all[NR_IRQS][NR_CPUS] array. Note: we still populate kstats_irq for all possible irqs in early_irq_init(). We probably could use on-demand allocations. (Code included in alloc_descs()). Problem is not all IRQS are used with a prior alloc_descs() call. kstat_irqs_this_cpu() is not used anymore, remove it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...
2011-01-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits) fs: add documentation on fallocate hole punching Gfs2: fail if we try to use hole punch Btrfs: fail if we try to use hole punch Ext4: fail if we try to use hole punch Ocfs2: handle hole punching via fallocate properly XFS: handle hole punching via fallocate properly fs: add hole punching to fallocate vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) fix signedness mess in rw_verify_area() on 64bit architectures fs: fix kernel-doc for dcache::prepend_path fs: fix kernel-doc for dcache::d_validate sanitize ecryptfs ->mount() switch afs move internal-only parts of ncpfs headers to fs/ncpfs switch ncpfs switch 9p pass default dentry_operations to mount_pseudo() switch hostfs switch affs switch configfs ...
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13pps: capture MONOTONIC_RAW timestamps as wellAlexander Gordeev
MONOTONIC_RAW clock timestamps are ideally suited for frequency calculation and also fit well into the original NTP hardpps design. Now phase and frequency can be adjusted separately: the former based on REALTIME clock and the latter based on MONOTONIC_RAW clock. A new function getnstime_raw_and_real is added to timekeeping subsystem to capture both timestamps at the same time and atomically. Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13ntp: add hardpps implementationAlexander Gordeev
This commit adds hardpps() implementation based upon the original one from the NTPv4 reference kernel code from David Mills. However, it is highly optimized towards very fast syncronization and maximum stickness to PPS signal. The typical error is less then a microsecond. To make it sync faster I had to throw away exponential phase filter so that the full phase offset is corrected immediately. Then I also had to throw away median phase filter because it gives a bigger error itself if used without exponential filter. Maybe we will find an appropriate filtering scheme in the future but it's not necessary if the signal quality is ok. Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13taskstats: use better ifdef for alignmentJeff Mahoney
Commit 4be2c95d ("taskstats: pad taskstats netlink response for aligment issues on ia64") added a null field to align the taskstats structure but the discussion centered around ia64. The issue exists on other platforms with inefficient unaligned access and adding them piecemeal would be an unmaintainable mess. This patch uses Dave Miller's suggestion of using a combination of CONFIG_64BIT && !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine whether alignment is needed. Note that this will cause breakage on those platforms with applications like iotop which had hard-coded offsets into the packet to access the taskstats structure. The message seen on systems without the alignment fixes looks like: kernel unaligned access to 0xe000023879dca9bc, ip=0xa000000100133d10 The addresses may vary but resolve to locations inside __delayacct_add_tsk. iotop makes what I'd call unreasonable assumptions about the contents of a netlink genetlink packet containing generic attributes. They're typed and have headers that specify value lengths, so the client can (should) identify and skip the ones the client doesn't understand. The kernel, as of version 2.6.36, presented a packet like so: +--------------------------------+ | genlmsghdr - 4 bytes | +--------------------------------+ | NLA header - 4 bytes | /* Aggregate header */ +-+------------------------------+ | | NLA header - 4 bytes | /* PID header */ | +------------------------------+ | | pid/tgid - 4 bytes | | +------------------------------+ | | NLA header - 4 bytes | /* stats header */ | + -----------------------------+ <- oops. aligned on 4 byte boundary | | struct taskstats - 328 bytes | +-+------------------------------+ The iotop code expects that the kernel will behave as it did then, assuming that the packet format is set in stone. The format is set in stone, but the packet offsets are not. There's nothing in the packet format that guarantees that the packet will be sent in exactly the same way. The attribute contents are set (or versioned) and the aggregate contents are set but they can be anywhere in the packet. The issue here isn't that an unaligned structure gets passed to userspace, it's that the NLA infrastructure has something of a weakness: The 4 byte attribute header may force the payload to be unaligned. The taskstats structure is created at an unaligned location and then 64-bit values are operated on inside the kernel, so the unaligned access warnings gets spewed everywhere. It's possible to use the unaligned access API to operate on the structure in the kernel but it seems like a wasted effort to work around userspace code that isn't following the packet format. Any new additions would also need the be worked around. It's a maintenance nightmare. The conclusion of the earlier discussion seemed to be "ok fine, if we have to break it, don't break it on arches that don't have the problem." Dave pointed out that the unaligned access problem doesn't only exist on ia64, but also on other 64-bit arches that don't have efficient unaligned access and it should be fixed there as well. The committed version of the patch and this addition keep with the conclusion of that discussion not to break it unnecessarily, which the pid padding and the packet padding fixes did do. x86_64 and powerpc don't suffer this problem so they shouldn't suffer the solution. Other 64-bit architectures do and will, though. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: David S. Miller <davem@davemloft.net> Acked-by: David S. Miller <davem@davemloft.net> Cc: Dan Carpenter <error27@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Florian Mickler <florian@mickler.org> Cc: Guillaume Chazarain <guichaz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13user_ns: improve the user_ns on-the-slab packagingPavel Emelyanov
Currently on 64-bit arch the user_namespace is 2096 and when being kmalloc-ed it resides on a 4k slab wasting 2003 bytes. If we allocate a separate cache for it and reduce the hash size from 128 to 64 chains the packaging becomes *much* better - the struct is 1072 bytes and the hole between is 98 bytes. [akpm@linux-foundation.org: s/__initcall/module_init/] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13sysctl: remove obsolete commentsJovi Zhang
ctl_unnumbered.txt have been removed in Documentation directory so just also remove this invalid comments [akpm@linux-foundation.org: fix Documentation/sysctl/00-INDEX, per Dave] Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Dave Young <hidave.darkstar@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13sysctl: fix #ifdef guard commentJovi Zhang
Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13fs/proc/base.c, kernel/latencytop.c: convert sprintf_symbol() to %psJoe Perches
Use temporary lr for struct latency_record for improved readability and fewer columns used. Removed trailing space from output. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Joe Perches <joe@perches.com> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13printk: use RCU to prevent potential lock contention in kmsg_dumpHuang Ying
dump_list_lock is used to protect dump_list in kmsg_dumper implementation, kmsg_dump() uses it to traverse dump_list too. But if there is contention on the lock, kmsg_dump() will fail, and the valuable kernel message may be lost. This patch solves this issue with RCU. Because kmsg_dump() only read the list, no lock is needed in kmsg_dump(). So that kmsg_dump() will never fail because of lock contention. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13kptr_restrict for hiding kernel pointers from unprivileged usersDan Rosenberg
Add the %pK printk format specifier and the /proc/sys/kernel/kptr_restrict sysctl. The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". [akpm@linux-foundation.org: check for IRQ context when !kptr_restrict, save an indent level, s/WARN/WARN_ONCE/] [akpm@linux-foundation.org: coding-style fixup] [randy.dunlap@oracle.com: fix kernel/sysctl.c warning] Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13kernel: clean up USE_GENERIC_SMP_HELPERSAmerigo Wang
For arch which needs USE_GENERIC_SMP_HELPERS, it has to select USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they don't provide their own implementions. Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in kernel/softirq.c. For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only on_each_cpu() is compiled. Signed-off-by: Amerigo Wang <amwang@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13kmsg_dump: add kmsg_dump() calls to the reboot, halt, poweroff and ↵Seiji Aguchi
emergency_restart paths We need to know the reason why system rebooted in support service. However, we can't inform our customers of the reason because final messages are lost on current Linux kernel. This patch improves the situation above because the final messages are saved by adding kmsg_dump() to reboot, halt, poweroff and emergency_restart path. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Marco Stornelli <marco.stornelli@gmail.com> Reviewed-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-12switch cgroupAl Viro
switching it to s_d_op allows to kill the cgroup_lookup() kludge. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-11Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits) perf session: Fix infinite loop in __perf_session__process_events perf evsel: Support perf_evsel__open(cpus > 1 && threads > 1) perf sched: Use PTHREAD_STACK_MIN to avoid pthread_attr_setstacksize() fail perf tools: Emit clearer message for sys_perf_event_open ENOENT return perf stat: better error message for unsupported events perf sched: Fix allocation result check perf, x86: P4 PMU - Fix unflagged overflows handling dynamic debug: Fix build issue with older gcc tracing: Fix TRACE_EVENT power tracepoint creation tracing: Fix preempt count leak tracepoint: Add __rcu annotation tracing: remove duplicate null-pointer check in skb tracepoint tracing/trivial: Add missing comma in TRACE_EVENT comment tracing: Include module.h in define_trace.h x86: Save rbp in pt_regs on irq entry x86, dumpstack: Fix unused variable warning x86, NMI: Clean-up default_do_nmi() x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU x86, NMI: Remove DIE_NMI_IPI x86, NMI: Add priorities to handlers ...
2011-01-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (30 commits) MAINTAINERS: Add tomoyo-dev-en ML. SELinux: define permissions for DCB netlink messages encrypted-keys: style and other cleanup encrypted-keys: verify datablob size before converting to binary trusted-keys: kzalloc and other cleanup trusted-keys: additional TSS return code and other error handling syslog: check cap_syslog when dmesg_restrict Smack: Transmute labels on specified directories selinux: cache sidtab_context_to_sid results SELinux: do not compute transition labels on mountpoint labeled filesystems This patch adds a new security attribute to Smack called SMACK64EXEC. It defines label that is used while task is running. SELinux: merge policydb_index_classes and policydb_index_others selinux: convert part of the sym_val_to_name array to use flex_array selinux: convert type_val_to_struct to flex_array flex_array: fix flex_array_put_ptr macro to be valid C SELinux: do not set automatic i_ino in selinuxfs selinux: rework security_netlbl_secattr_to_sid SELinux: standardize return code handling in selinuxfs.c SELinux: standardize return code handling in selinuxfs.c SELinux: standardize return code handling in policydb.c ...
2011-01-10Merge branch 'kbuild' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: mkuboot.sh: Fail if mkimage is missing gen_init_cpio: checkpatch fixes gen_init_cpio: Avoid race between call to stat() and call to open() modpost: Fix address calculation in reloc_location() Make fixdep error handling more explicit checksyscalls: Fix stand-alone usage modpost: Put .zdebug* section on white list kbuild: fix interaction of CONFIG_IKCONFIG and KCONFIG_CONFIG kbuild: export linux/{a.out,kvm,kvm_para}.h on headers_install_all kbuild: introduce HDR_ARCH_LIST for headers_install_all headers_install: check exit status of unifdef gen_init_cpio: remove leading `/' from file names scripts/genksyms: fix header usage fixdep: use hash table instead of a single array
2011-01-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: spi / PM: Support dev_pm_ops PM: Prototype the pm_generic_ operations PM / Runtime: Generic resume shouldn't set RPM_ACTIVE unconditionally PM: Use dev_name() in core device suspend and resume routines PM: Permit registration of parentless devices during system suspend PM: Replace the device power.status field with a bit field PM: Remove redundant checks from core device resume routines PM: Use a different list of devices for each stage of device suspend PM: Avoid compiler warning in pm_noirq_op() PM: Use pm_wakeup_pending() in __device_suspend() PM / Wakeup: Replace pm_check_wakeup_events() with pm_wakeup_pending() PM: Prevent dpm_prepare() from returning errors unnecessarily PM: Fix references to basic-pm-debugging.txt in drivers-testing.txt PM / Runtime: Add synchronous runtime interface for interrupt handlers (v3) PM / Hibernate: When failed, in_suspend should be reset PM / Hibernate: hibernation_ops->leave should be checked too Freezer: Fix a race during freezing of TASK_STOPPED tasks PM: Use proper ccflag flag in kernel/power/Makefile PM / Runtime: Fix comments to match runtime callback code
2011-01-10block: ensure that completion error gets properly tracedJens Axboe
We normally just use the BIO_UPTODATE flag to signal 0/-EIO. If we have more information available, we should pass that along to the trace output. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-01-10Merge branch 'master' into nextJames Morris
Conflicts: security/smack/smack_lsm.c Verified and added fix by Stephen Rothwell <sfr@canb.auug.org.au> Ok'd by Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-01-09Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2011-01-07tracing: Fix TRACE_EVENT power tracepoint creationMathieu Desnoyers
DEFINE_TRACE should also exist when CONFIG_EVENT_TRACING=n. Otherwise, setting only TRACEPOINTS=y is broken. Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20101028153117.GA4051@Krystal> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-01-07tracing: Fix preempt count leakLi Zefan
While running my ftrace stress test, this showed up: BUG: sleeping function called from invalid context at mm/mmap.c:233 ... note: cat[3293] exited with preempt_count 1 The bug was introduced by commit 91e86e560d0b3ce4c5fc64fd2bbb99f856a30a4e ("tracing: Fix recursive user stack trace") Cc: <stable@kernel.org> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4D0089AC.1020802@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-01-07Merge branch 'for-2.6.38' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.
2011-01-07Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits) usb: don't use flush_scheduled_work() speedtch: don't abuse struct delayed_work media/video: don't use flush_scheduled_work() media/video: explicitly flush request_module work ioc4: use static work_struct for ioc4_load_modules() init: don't call flush_scheduled_work() from do_initcalls() s390: don't use flush_scheduled_work() rtc: don't use flush_scheduled_work() mmc: update workqueue usages mfd: update workqueue usages dvb: don't use flush_scheduled_work() leds-wm8350: don't use flush_scheduled_work() mISDN: don't use flush_scheduled_work() macintosh/ams: don't use flush_scheduled_work() vmwgfx: don't use flush_scheduled_work() tpm: don't use flush_scheduled_work() sonypi: don't use flush_scheduled_work() hvsi: don't use flush_scheduled_work() xen: don't use flush_scheduled_work() gdrom: don't use flush_scheduled_work() ... Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c as per Tejun.
2011-01-07Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Constify function scope static struct sched_param usage sched: Fix strncmp operation sched: Move sched_autogroup_exit() to free_signal_struct() sched: Fix struct autogroup memory leak sched: Mark autogroup_init() __init sched: Consolidate the name of root_task_group and init_task_group
2011-01-07Merge branch 'tty-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (36 commits) serial: apbuart: Fixup apbuart_console_init() TTY: Add tty ioctl to figure device node of the system console. tty: add 'active' sysfs attribute to tty0 and console device drivers: serial: apbuart: Handle OF failures gracefully Serial: Avoid unbalanced IRQ wake disable during resume tty: fix typos/errors in tty_driver.h comments pch_uart : fix warnings for 64bit compile 8250: fix uninitialized FIFOs ip2: fix compiler warning on ip2main_pci_tbl specialix: fix compiler warning on specialix_pci_tbl rocket: fix compiler warning on rocket_pci_ids 8250: add a UPIO_DWAPB32 for 32 bit accesses 8250: use container_of() instead of casting serial: omap-serial: Add support for kernel debugger serial: fix pch_uart kconfig & build drivers: char: hvc: add arm JTAG DCC console support RS485 documentation: add 16C950 UART description serial: ifx6x60: fix memory leak serial: ifx6x60: free IRQ on error Serial: EG20T: add PCH_UART driver ... Fixed up conflicts in drivers/serial/apbuart.c with evil merge that makes the code look fairly sane (unlike either side).
2011-01-07Merge branch 'vfs-scale-working' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...
2011-01-07sched: Constify function scope static struct sched_param usagePeter Zijlstra
Function-scope statics are discouraged because they are easily overlooked and can cause subtle bugs/races due to their global (non-SMP safe) nature. Linus noticed that we did this for sched_param - at minimum make the const. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: Message-ID: <AANLkTinotRxScOHEb0HgFgSpGPkq_6jKTv5CfvnQM=ee@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Fix strncmp operationHillf Danton
One of the operands, buf, is incorrect, since it is stripped and the correct address for subsequent string comparing could change if leading white spaces, if any, are removed from buf. It is fixed by replacing buf with cmp. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <AANLkTinOPuYsVovrZpbuCCmG5deEyc8WgA_A1RJx_YK7@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Move sched_autogroup_exit() to free_signal_struct()Mike Galbraith
Per Oleg's suggestion, undo fork failure free/put_signal_struct change, and move sched_autogroup_exit() to free_signal_struct() instead. Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294222564.8369.6.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Fix struct autogroup memory leakMike Galbraith
Seems I lost a change somewhere, leaking memory. sched: fix struct autogroup memory leak Add missing change to actually use autogroup_free(). Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294222285.8369.2.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Mark autogroup_init() __initYong Zhang
autogroup_init() is only called at boot time. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294375425-31065-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Consolidate the name of root_task_group and init_task_groupYong Zhang
root_task_group is the leftover of USER_SCHED, now it's always same to init_task_group. But as Mike suggested, root_task_group is maybe the suitable name to keep for a tree. So in this patch: init_task_group --> root_task_group init_task_group_load --> root_task_group_load INIT_TASK_GROUP_LOAD --> ROOT_TASK_GROUP_LOAD Suggested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110107071736.GA32635@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07perf_events: Add perf_event_time()Stephane Eranian
Adds perf_event_time() to try and centralize access to event timing and in particular ctx->time. Prepares for cgroup support. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d22059c.122ae30a.5e0e.ffff8b8b@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07perf_events: Generalize use of event_filter_match()Stephane Eranian
Replace all occurrences of: event->cpu != -1 && event->cpu == smp_processor_id() by a call to: event_filter_match(event) This makes the code more consistent and will make the cgroup patch smaller. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d220593.2308e30a.48c5.ffff8ae9@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07perf_events: Move code around to prepare for cgroupStephane Eranian
In particular this patch move perf_event_exit_task() before cgroup_exit() to allow for cgroup support. The cgroup_exit() function detaches the cgroups attached to a task. Other movements include hoisting some definitions and inlines at the top of perf_event.c Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d22058b.cdace30a.4657.ffff95b1@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07blktrace: add missing probe argument to block_bio_completeMathieu Desnoyers
blktrace.c block bio complete callback needs to gain a new argument to reflect the newly added "error" tracepoint argument. This is needed to match the new block_bio_complete TRACE_EVENT as of commit de983a7bfcb7c020901ca6e2314cf55a4207ab5a. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Jeff Moyer <jmoyer@redhat.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-01-07fs: dcache reduce branches in lookup pathNick Piggin
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache rationalise dget variantsNick Piggin
dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove dcache_lockNick Piggin
dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale subdirsNick Piggin
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale dentry refcountNick Piggin
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>