aboutsummaryrefslogtreecommitdiff
path: root/kernel/kexec.c
AgeCommit message (Collapse)Author
2014-01-27kernel/kexec.c: use vscnprintf() instead of vsnprintf() in ↵Chen Gang
vmcoreinfo_append_str() vsnprintf() may let 'r' larger than sizeof(buf), in this case, if 'r' is also less than "vmcoreinfo_max_size - vmcoreinfo_size" (left size of destination buffer), next memcpy() will read the unexpected addresses. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23kexec: add sysctl to disable kexec_loadKees Cook
For general-purpose (i.e. distro) kernel builds it makes sense to build with CONFIG_KEXEC to allow end users to choose what kind of things they want to do with kexec. However, in the face of trying to lock down a system with such a kernel, there needs to be a way to disable kexec_load (much like module loading can be disabled). Without this, it is too easy for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM and modules_disabled are set. With this change, it is still possible to load an image for use later, then disable kexec_load so the image (or lack of image) can't be altered. The intention is for using this in environments where "perfect" enforcement is hard. Without a verified boot, along with verified modules, and along with verified kexec, this is trying to give a system a better chance to defend itself (or at least grow the window of discoverability) against attack in the face of a privilege escalation. In my mind, I consider several boot scenarios: 1) Verified boot of read-only verified root fs loading fd-based verification of kexec images. 2) Secure boot of writable root fs loading signed kexec images. 3) Regular boot loading kexec (e.g. kcrash) image early and locking it. 4) Regular boot with no control of kexec image at all. 1 and 2 don't exist yet, but will soon once the verified kexec series has landed. 4 is the state of things now. The gap between 2 and 4 is too large, so this change creates scenario 3, a middle-ground above 4 when 2 and 1 are not possible for a system. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18kexec: migrate to reboot cpuVivek Goyal
Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") moved reboot= handling to generic code. In the process it also removed the code in native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0. I guess that thought must have been that all reboot paths are calling migrate_to_reboot_cpu(), so we don't need this special handling. But kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu() so above change broke kexec. Now reboot can happen on non-boot cpu and when INIT is sent in second kerneo to bring up BP, it brings down the machine. So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid this problem. Bisected by WANG Chao. Reported-by: Matthew Whitehead <mwhitehe@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-07PCI: Disable Bus Master only on kexec rebootKhalid Aziz
Add a flag to tell the PCI subsystem that kernel is shutting down in preparation to kexec a kernel. Add code in PCI subsystem to use this flag to clear Bus Master bit on PCI devices only in case of kexec reboot. This fixes a power-off problem on Acer Aspire V5-573G and likely other machines and avoids any other issues caused by clearing Bus Master bit on PCI devices in normal shutdown path. The problem was introduced by b566a22c2332 ("PCI: disable Bus Master on PCI device shutdown"). This patch is based on discussion at http://marc.info/?l=linux-pci&m=138425645204355&w=2 Link: https://bugzilla.kernel.org/show_bug.cgi?id=63861 Reported-by: Chang Liu <cl91tp@gmail.com> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: stable@vger.kernel.org # v3.5+
2013-11-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
2013-10-14kexec: Typo s/the/then/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-09-11kexec: remove unnecessary returnXishi Qiu
Code can not run here forever, so remove the unnecessary return. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Suggested-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: Simon Horman <horms@verge.net.au> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30kexec: Use min() and min_t() to simplify logicZhang Yanfei
Simplify the logic of variable assignments. [akpm@linux-foundation.org: replace min_t with min, remove unneeded casts] Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Simon Horman <horms@verge.net.au> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30kexec: fix wrong types of some local variablesZhang Yanfei
The types of the following local variables: - ubytes/mbytes in kimage_load_crash_segment()/kimage_load_normal_segment() - r in vmcoreinfo_append_str() are wrong, so fix them. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Simon Horman <horms@verge.net.au> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29kexec, vmalloc: export additional vmalloc layer informationAtsushi Kumagai
Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get the start address of vmalloc region (vmalloc_start). The address which contains vmalloc_start value is represented as below: vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start) However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list) aren't exported as VMCOREINFO. So this patch exports them externally with small cleanup. [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head] Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, vmalloc: export vmap_area_list, instead of vmlistJoonsoo Kim
Although our intention is to unexport internal structure entirely, but there is one exception for kexec. kexec dumps address of vmlist and makedumpfile uses this information. We are about to remove vmlist, then another way to retrieve information of vmalloc layer is needed for makedumpfile. For this purpose, we export vmap_area_list, instead of vmlist. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm,kexec: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Eric Biederman <ebiederm@xmission.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17kexec: use Crash kernel for Crash kernel lowYinghai Lu
We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem instead. So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem. Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-3-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Change crashkernel_high/low= to crashkernel=,high/lowYinghai Lu
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of crashkernel_hign=X crashkernel_low=Y. As that could be extensible. -v2: according to Vivek, change delimiter to ; -v3: let hign and low only handle simple form and it conforms to description in kernel-parameters.txt still keep crashkernel=X override any crashkernel=X,high crashkernel=Y,low -v4: update get_last_crashkernel returning and add more strict checking in parse_crashkernel_simple() found by HATAYAMA. -v5: Change delimiter back to , according to HPA. also separate parse_suffix from parse_simper according to vivek. so we can avoid @pos in that path. -v6: Tight the checking about crashkernel=X,highblahblah,high found by HTYAYAMA. Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Retore crashkernel= to allocate under 896MYinghai Lu
Vivek found old kexec-tools does not work new kernel anymore. So change back crashkernel= back to old behavoir, and add crashkernel_high= to let user decide if buffer could be above 4G, and also new kexec-tools will be needed. -v2: let crashkernel=X override crashkernel_high= update description about _high will be ignored by crashkernel=X -v3: update description about kernel-parameters.txt according to Vivek. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-27kexec: avoid freeing NULL pointer in image_crash_alloc()Zhang Yanfei
Though there is no error if we free a NULL pointer, I think we could avoid this behaviour. Change the code a little in kimage_crash_alloc() could avoid this kind of unnecessary free. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27kexec: fix memory leak in function kimage_normal_allocZhang Yanfei
If kimage_normal_alloc() fails to alloc pages for image->swap_page, it should call kimage_free_page_list() to free allocated pages in image->control_pages list before it frees image. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27kexec: prevent double free on image allocation failureSasha Levin
If kimage_normal_alloc() fails to initialize an allocated kimage, it will free the image but would still set 'rimage', as a result kexec_load will try to free it again. This would explode as part of the freeing process is accessing internal members which point to uninitialized memory. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27kexec: export PG_hwpoison flag into vmcoreinfoMitsuhiro Tanino
This patch exports a PG_hwpoison into vmcoreinfo when CONFIG_MEMORY_FAILURE is defined. "makedumpfile" needs to read information of memory, such as 'mem_section', 'zone', 'pageflags' from vmcore. We introduce a function into "makedumpfile" to exclude hwpoison page from vmcore dump. In order to introduce this function, PG_hwpoison flag have to export into vmcoreinfo. Signed-off-by: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27kexec: get rid of duplicate check for hole_endZhang Yanfei
hole_end has been checked to make sure it is <= crash_res.end in the while condition check, so the if condition check is duplicate. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27kexec: add the values related to buddy system for filtering free pages.Atsushi Kumagai
tAdd adds the values related to buddy system to vmcoreinfo data so that makedumpfile (dump filtering command) can filter out all free pages with the new logic. It's faster than the current logic because it can distinguish free page by analyzing page structure at the same time as filtering for other unnecessary pages (e.g. anonymous page). OTOH, the current logic has to trace free_list to distinguish free pages while analyzing page structure to filter out other unnecessary pages. The new logic uses the fact that buddy page is marked by _mapcount == PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab is set or not before looking up _mapcount value. And we can get the order of buddy system from private field. To sum it up, the values below are required for this logic. Required values: - OFFSET(page._mapcount) - OFFSET(page.private) - NUMBER(PG_slab) - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) Changelog from v1 to v2: 1. remove SIZE(pageflags) The new logic was changed after I sent v1 patch. Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile. What's makedumpfile: makedumpfile creates a small dumpfile by excluding unnecessary pages for the analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo data which has the minimum debugging information only for dump filtering. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-29x86: Add Crash kernel low reservationYinghai Lu
During kdump kernel's booting stage, it need to find low ram for swiotlb buffer when system does not support intel iommu/dmar remapping. kexed-tools is appending memmap=exactmap and range from /proc/iomem with "Crash kernel", and that range is above 4G for 64bit after boot protocol 2.12. We need to add another range in /proc/iomem like "Crash kernel low", so kexec-tools could find that info and append to kdump kernel command line. Try to reserve some under 4G if the normal "Crash kernel" is above 4G. User could specify the size with crashkernel_low=XX[KMG]. -v2: fix warning that is found by Fengguang's test robot. -v3: move out get_mem_size change to another patch, to solve compiling warning that is found by Borislav Petkov <bp@alien8.de> -v4: user must specify crashkernel_low if system does not support intel or amd iommu. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org Cc: Eric Biederman <ebiederm@xmission.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-06kdump: remove unneeded includeWei Yongjun
The inclusion of <generated/utsrelease.h> is unnecessary. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30kdump: append newline to the last lien of vmcoreinfo noteVivek Goyal
The last line of vmcoreinfo note does not end with \n. Parsing all the lines in note becomes easier if all lines end with \n instead of trying to special case the last line. I know at least one tool, vmcore-dmesg in kexec-tools tree which made the assumption that all lines end with \n. I think it is a good idea to fix it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge third batch of patches from Andrew Morton: - Some MM stragglers - core SMP library cleanups (on_each_cpu_mask) - Some IPI optimisations - kexec - kdump - IPMI - the radix-tree iterator work - various other misc bits. "That'll do for -rc1. I still have ~10 patches for 3.4, will send those along when they've baked a little more." * emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits) backlight: fix typo in tosa_lcd.c crc32: add help text for the algorithm select option mm: move hugepage test examples to tools/testing/selftests/vm mm: move slabinfo.c to tools/vm mm: move page-types.c from Documentation to tools/vm selftests/Makefile: make `run_tests' depend on `all' selftests: launch individual selftests from the main Makefile radix-tree: use iterators in find_get_pages* functions radix-tree: rewrite gang lookup using iterator radix-tree: introduce bit-optimized iterator fs/proc/namespaces.c: prevent crash when ns_entries[] is empty nbd: rename the nbd_device variable from lo to nbd pidns: add reboot_pid_ns() to handle the reboot syscall sysctl: use bitmap library functions ipmi: use locks on watchdog timeout set on reboot ipmi: simplify locking ipmi: fix message handling during panics ipmi: use a tasklet for handling received messages ipmi: increase KCS timeouts ipmi: decrease the IPMI message transaction time in interrupt mode ...
2012-03-28kexec: add further check to crashkernelZhenzhong Duan
When using crashkernel=2M-256M, the kernel doesn't give any warning. This is misleading sometimes. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28kexec: crash: don't save swapper_pg_dir for !CONFIG_MMU configurationsWill Deacon
nommu platforms don't have very interesting swapper_pg_dir pointers and usually just #define them to NULL, meaning that we can't include them in the vmcoreinfo on the kexec crash path. This patch only saves the swapper_pg_dir if we have an MMU. Signed-off-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28Remove all #inclusions of asm/system.hDavid Howells
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-01-29PM / Sleep: Introduce "late suspend" and "early resume" of devicesRafael J. Wysocki
The current device suspend/resume phases during system-wide power transitions appear to be insufficient for some platforms that want to use the same callback routines for saving device states and related operations during runtime suspend/resume as well as during system suspend/resume. In principle, they could point their .suspend_noirq() and .resume_noirq() to the same callback routines as their .runtime_suspend() and .runtime_resume(), respectively, but at least some of them require device interrupts to be enabled while the code in those routines is running. It also makes sense to have device suspend-resume callbacks that will be executed with runtime PM disabled and with device interrupts enabled in case someone needs to run some special code in that context during system-wide power transitions. Apart from this, .suspend_noirq() and .resume_noirq() were introduced as a workaround for drivers using shared interrupts and failing to prevent their interrupt handlers from accessing suspended hardware. It appears to be better not to use them for other porposes, or we may have to deal with some serious confusion (which seems to be happening already). For the above reasons, introduce new device suspend/resume phases, "late suspend" and "early resume" (and analogously for hibernation) whose callback will be executed with runtime PM disabled and with device interrupts enabled and whose callback pointers generally may point to runtime suspend/resume routines. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Kevin Hilman <khilman@ti.com>
2012-01-12kdump: crashk_res init check for /sys/kernel/kexec_crash_sizeMichael Holzheu
Currently it is possible to set the crash_size via the sysfs /sys/kernel/kexec_crash_size even if no crash kernel memory has been defined with the "crashkernel" parameter. In this case "crashk_res" is not initialized and crashk_res.start = crashk_res.end = 0. Unfortunately resource_size(&crashk_res) returns 1 in this case. This breaks the s390 implementation of crash_(un)map_reserved_pages(). To fix the problem the correct "old_size" is now calculated in crash_shrink_memory(). "old_size is set to "0" if crashk_res is not initialized. With this change crash_shrink_memory() will do nothing, when "crashk_res" is not initialized. It will return "0" for "echo 0 > /sys/kernel/kexec_crash_size" and -EINVAL for "echo [not zero] > /sys/kernel/kexec_crash_size". In addition to that this patch also simplifies the "ret = -EINVAL" vs. "ret = 0" logic as suggested by Simon Horman. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Dave Young <dyoung@redhat.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kdump: add missing RAM resource in crash_shrink_memory()Michael Holzheu
When shrinking crashkernel memory using /sys/kernel/kexec_crash_size for the newly added memory no RAM resource is created at the moment. Example: $ cat /proc/iomem 00000000-bfffffff : System RAM 00000000-005b7ac3 : Kernel code 005b7ac4-009743bf : Kernel data 009bb000-00a85c33 : Kernel bss c0000000-cfffffff : Crash kernel d0000000-ffffffff : System RAM $ echo 0 > /sys/kernel/kexec_crash_size $ cat /proc/iomem 00000000-bfffffff : System RAM 00000000-005b7ac3 : Kernel code 005b7ac4-009743bf : Kernel data 009bb000-00a85c33 : Kernel bss <<-- here is System RAM missing d0000000-ffffffff : System RAM One result of this bug is that the memory chunk can never be set offline using memory hotplug. With this patch I insert a new "System RAM" resource for the released memory. Then the upper example looks like the following: $ echo 0 > /sys/kernel/kexec_crash_size $ cat /proc/iomem 00000000-bfffffff : System RAM 00000000-005b7ac3 : Kernel code 005b7ac4-009743bf : Kernel data 009bb000-00a85c33 : Kernel bss c0000000-cfffffff : System RAM <<-- new rescoure d0000000-ffffffff : System RAM And now I can set chunk c0000000-cfffffff offline. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kexec: remove KMSG_DUMP_KEXECWANG Cong
KMSG_DUMP_KEXEC is useless because we already save kernel messages inside /proc/vmcore, and it is unsafe to allow modules to do other stuffs in a crash dump scenario. [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-08PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()Srivatsa S. Bhat
Using [un]lock_system_sleep() is safer than directly using mutex_[un]lock() on 'pm_mutex', since the latter could lead to freezing failures. Hence convert all the present users of mutex_[un]lock(&pm_mutex) to use these safe APIs instead. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-30[S390] kdump: Add infrastructure for unmapping crashkernel memoryMichael Holzheu
This patch introduces a mechanism that allows architecture backends to remove page tables for the crashkernel memory. This can protect the loaded kdump kernel from being overwritten by broken kernel code. Two new functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are added that can be implemented by architecture code. The crash_map_reserved_pages() function is called before and crash_unmap_reserved_pages() after the crashkernel segments are loaded. The functions are also called in crash_shrink_memory() to create/remove page tables when the crashkernel memory size is reduced. To support architectures that have large pages this patch also introduces a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must always be aligned with KEXEC_CRASH_MEM_ALIGN. Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30[S390] kdump: Initialize vmcoreinfo note at startupMichael Holzheu
Currently the vmcoreinfo note is only initialized in case of kdump. On s390 it is possible to create kernel dumps with other dump mechanisms than kdump (e.g. via hypervisor dump or stand-alone dump tools). For those dumps it would also be desirable to include the vmcoreinfo data. To accomplish this, with this patch the vmcoreinfo ELF note is always initialized, not only in case of a (kdump) crash. On s390 we will add an ABI defined pointer at a well known address to vmcoreinfo so that dump analysis tools are able to find this information. In particular on s390 we have a tool named zgetdump. With this tool it is possible to convert dump formats on the fly using fuse. E.g. you can mount a s390 stand-alone dump as ELF dump. When this is done, the tool finds the vmcoreinfo in the stand-alone dump via the well known ABI defined address and it creates the respective VMCOREINFO ELF note in the output ELF dump. This then can be used e.g. by makedumpfile for dump filtering. No more need for a vmlinux file with debug information. So this will look like the following: $ zgetdump --mount standalone.dump -f elf /mnt $ ls /mnt dump.elf $ readelf -n /mnt/dump.elf $ ... VMCOREINFO 0x00000474 Unknown note type: (0x00000000) $ makedumpfile -c -d 31 /mnt/dump.elf dump.kdump Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30[S390] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMITMichael Holzheu
On s390 there is a different KEXEC_CONTROL_MEMORY_LIMIT for the normal and the kdump kexec case. Therefore this patch introduces a new macro KEXEC_CRASH_CONTROL_MEMORY_LIMIT. This is set to KEXEC_CONTROL_MEMORY_LIMIT for all architectures that do not define KEXEC_CRASH_CONTROL_MEMORY_LIMIT. Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-06-10treewide: Convert uses of struct resource to resource_size(ptr)Joe Perches
Several fixes as well where the +1 was missing. Done via coccinelle scripts like: @@ struct resource *ptr; @@ - ptr->end - ptr->start + 1 + resource_size(ptr) and some grep and typing. Mostly uncompiled, no cross-compilers. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-05-11PM: Remove sysdev suspend, resume and shutdown operationsRafael J. Wysocki
Since suspend, resume and shutdown operations in struct sysdev_class and struct sysdev_driver are not used any more, remove them. Also drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used for executing those operations and modify all of their users accordingly. This reduces kernel code size quite a bit and reduces its complexity. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-20PM: Add missing syscore_suspend() and syscore_resume() callsRafael J. Wysocki
Device suspend/resume infrastructure is used not only by the suspend and hibernate code in kernel/power, but also by APM, Xen and the kexec jump feature. However, commit 40dc166cb5dddbd36aa4ad11c03915ea (PM / Core: Introduce struct syscore_ops for core subsystems PM) failed to add syscore_suspend() and syscore_resume() calls to that code, which generally leads to breakage when the features in question are used. To fix this problem, add the missing syscore_suspend() and syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c and drivers/xen/manage.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-01kdump: Allow shrinking of kdump region to be overriddenAnton Blanchard
On ppc64 the crashkernel region almost always overlaps an area of firmware. This works fine except when using the sysfs interface to reduce the kdump region. If we free the firmware area we are guaranteed to crash. Rename free_reserved_phys_range to crash_free_reserved_phys_range and make it a weak function so we can override it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-26use clear_page()/copy_page() in favor of memset()/memcpy() on whole pagesJan Beulich
After all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kexec: return -EFAULT on copy_to_user() failuresDan Carpenter
copy_to/from_user() returns the number of bytes remaining to be copied. It never returns a negative value. The correct return code is -EFAULT and not -EIO. All the callers check for non-zero returns so that's Ok, but the return code is passed to the user so we should fix this. Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Simon Kagstrom <simon.kagstrom@netinsight.net> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-29kexec: fix Oops in crash_shrink_memory()Pavan Naregundi
When crashkernel is not enabled, "echo 0 > /sys/kernel/kexec_crash_size" OOPSes the kernel in crash_shrink_memory. This happens when crash_shrink_memory tries to release the 'crashk_res' resource which are not reserved. Also value of "/sys/kernel/kexec_crash_size" shows as 1, which should be 0. This patch fixes the OOPS in crash_shrink_memory and shows "/sys/kernel/kexec_crash_size" as 0 when crash kernel memory is not reserved. Signed-off-by: Pavan Naregundi <pavan@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-11kexec: fix OOPS in crash_kernel_shrinkVitaly Mayatskikh
Two "echo 0 > /sys/kernel/kexec_crash_size" OOPSes kernel. Also content of this file is invalid after first shrink to zero: it shows 1 instead of 0. This scenario is unlikely to happen often (root privs, valid crashkernel= in cmdline, dump-capture kernel not loaded), I hit it only by chance. This patch fixes it. Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Cong Wang <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-17percpu: add __percpu sparse annotations to core kernel subsystemsTejun Heo
Add __percpu sparse annotations to core subsystems. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-mm@kvack.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Biederman <ebiederm@xmission.com>
2010-01-24Merge git://git.infradead.org/~dwmw2/mtd-2.6.33Linus Torvalds
* git://git.infradead.org/~dwmw2/mtd-2.6.33: mtd: tests: fix read, speed and stress tests on NOR flash mtd: Really add ARM pismo support kmsg_dump: Dump on crash_kexec as well
2009-12-31kmsg_dump: Dump on crash_kexec as wellKOSAKI Motohiro
crash_kexec gets called before kmsg_dump(KMSG_DUMP_OOPS) if panic_on_oops is set, so the kernel log buffer is not stored for this case. This patch adds a KMSG_DUMP_KEXEC dump type which gets called when crash_kexec() is invoked. To avoid getting double dumps, the old KMSG_DUMP_PANIC is moved below crash_kexec(). The mtdoops driver is modified to handle KMSG_DUMP_KEXEC in the same way as a panic. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Simon Kagstrom <simon.kagstrom@netinsight.net> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>