aboutsummaryrefslogtreecommitdiff
path: root/mm/frontswap.c
AgeCommit message (Collapse)Author
2014-06-04swap: change swap_list_head to plist, add swap_avail_headDan Streetman
Originally get_swap_page() started iterating through the singly-linked list of swap_info_structs using swap_list.next or highest_priority_index, which both were intended to point to the highest priority active swap target that was not full. The first patch in this series changed the singly-linked list to a doubly-linked list, and removed the logic to start at the highest priority non-full entry; it starts scanning at the highest priority entry each time, even if the entry is full. Replace the manually ordered swap_list_head with a plist, swap_active_head. Add a new plist, swap_avail_head. The original swap_active_head plist contains all active swap_info_structs, as before, while the new swap_avail_head plist contains only swap_info_structs that are active and available, i.e. not full. Add a new spinlock, swap_avail_lock, to protect the swap_avail_head list. Mel Gorman suggested using plists since they internally handle ordering the list entries based on priority, which is exactly what swap was doing manually. All the ordering code is now removed, and swap_info_struct entries and simply added to their corresponding plist and automatically ordered correctly. Using a new plist for available swap_info_structs simplifies and optimizes get_swap_page(), which no longer has to iterate over full swap_info_structs. Using a new spinlock for swap_avail_head plist allows each swap_info_struct to add or remove themselves from the plist when they become full or not-full; previously they could not do so because the swap_info_struct->lock is held when they change from full<->not-full, and the swap_lock protecting the main swap_active_head must be ordered before any swap_info_struct->lock. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04swap: change swap_info singly-linked list to list_headDan Streetman
The logic controlling the singly-linked list of swap_info_struct entries for all active, i.e. swapon'ed, swap targets is rather complex, because: - it stores the entries in priority order - there is a pointer to the highest priority entry - there is a pointer to the highest priority not-full entry - there is a highest_priority_index variable set outside the swap_lock - swap entries of equal priority should be used equally this complexity leads to bugs such as: https://lkml.org/lkml/2014/2/13/181 where different priority swap targets are incorrectly used equally. That bug probably could be solved with the existing singly-linked lists, but I think it would only add more complexity to the already difficult to understand get_swap_page() swap_list iteration logic. The first patch changes from a singly-linked list to a doubly-linked list using list_heads; the highest_priority_index and related code are removed and get_swap_page() starts each iteration at the highest priority swap_info entry, even if it's full. While this does introduce unnecessary list iteration (i.e. Schlemiel the painter's algorithm) in the case where one or more of the highest priority entries are full, the iteration and manipulation code is much simpler and behaves correctly re: the above bug; and the fourth patch removes the unnecessary iteration. The second patch adds some minor plist helper functions; nothing new really, just functions to match existing regular list functions. These are used by the next two patches. The third patch adds plist_requeue(), which is used by get_swap_page() in the next patch - it performs the requeueing of same-priority entries (which moves the entry to the end of its priority in the plist), so that all equal-priority swap_info_structs get used equally. The fourth patch converts the main list into a plist, and adds a new plist that contains only swap_info entries that are both active and not full. As Mel suggested using plists allows removing all the ordering code from swap - plists handle ordering automatically. The list naming is also clarified now that there are two lists, with the original list changed from swap_list_head to swap_active_head and the new list named swap_avail_head. A new spinlock is also added for the new list, so swap_info entries can be added or removed from the new list immediately as they become full or not full. This patch (of 4): Replace the singly-linked list tracking active, i.e. swapon'ed, swap_info_struct entries with a doubly-linked list using struct list_heads. Simplify the logic iterating and manipulating the list of entries, especially get_swap_page(), by using standard list_head functions, and removing the highest priority iteration logic. The change fixes the bug: https://lkml.org/lkml/2014/2/13/181 in which different priority swap entries after the highest priority entry are incorrectly used equally in pairs. The swap behavior is now as advertised, i.e. different priority swap entries are used in order, and equal priority swap targets are used concurrently. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12frontswap: fix incorrect zeroing and allocation size for frontswap_mapAkinobu Mita
The bitmap accessed by bitops must have enough size to hold the required numbers of bits rounded up to a multiple of BITS_PER_LONG. And the bitmap must not be zeroed by memset() if the number of bits cleared is not a multiple of BITS_PER_LONG. This fixes incorrect zeroing and allocation size for frontswap_map. The incorrect zeroing part doesn't cause any problem because frontswap_map is freed just after zeroing. But the wrongly calculated allocation size may cause the problem. For 32bit systems, the allocation size of frontswap_map is about twice as large as required size. For 64bit systems, the allocation size is smaller than requeired if the number of bits is not a multiple of BITS_PER_LONG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30frontswap: get rid of swap_lock dependencyMinchan Kim
Frontswap initialization routine depends on swap_lock, which want to be atomic about frontswap's first appearance. IOW, frontswap is not present and will fail all calls OR frontswap is fully functional but if new swap_info_struct isn't registered by enable_swap_info, swap subsystem doesn't start I/O so there is no race between init procedure and page I/O working on frontswap. So let's remove unnecessary swap_lock dependency. Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Minchan Kim <minchan@kernel.org> [v1: Rebased on my branch, reworked to work with backends loading late] [v2: Added a check for !map] [v3: Made the invalidate path follow the init path] [v4: Address comments by Wanpeng Li <liwanp@linux.vnet.ibm.com>] Signed-off-by: Konrad Rzeszutek Wilk <konrad@darnok.org> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30mm: frontswap: cleanup codeBob Liu
After allowing tmem backends to build/run as modules, frontswap_enabled always true if defined CONFIG_FRONTSWAP. But frontswap_test() depends on whether backend is registered, mv it into frontswap.c using fronstswap_ops to make the decision. frontswap_set/clear are not used outside frontswap, so don't export them. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30frontswap: make frontswap_init use a pointer for the opsKonrad Rzeszutek Wilk
This simplifies the code in the frontswap - we can get rid of the 'backend_registered' test and instead check against frontswap_ops. [v1: Rebase on top of 703ba7fe5e0 (ramster->zcache move] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30mm: frontswap: lazy initialization to allow tmem backends to build/run as ↵Dan Magenheimer
modules With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be built/loaded as modules rather than built-in and enabled by a boot parameter, this patch provides "lazy initialization", allowing backends to register to frontswap even after swapon was run. Before a backend registers all calls to init are recorded and the creation of tmem_pools delayed until a backend registers or until a frontswap store is attempted. Signed-off-by: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Florian Schmaus <fschmaus@gmail.com> Signed-off-by: Andor Daam <andor.daam@googlemail.com> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> [v1: Fixes per Seth Jennings suggestions] [v2: Removed FRONTSWAP_HAS_.. ] [v3: Fix up per Bob Liu <lliubbo@gmail.com> recommendations] [v4: Fix up per Andrew's comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-21frontswap: support exclusive gets if tmem backend is capableDan Magenheimer
Tmem, as originally specified, assumes that "get" operations performed on persistent pools never flush the page of data out of tmem on a successful get, waiting instead for a flush operation. This is intended to mimic the model of a swap disk, where a disk read is non-destructive. Unlike a disk, however, freeing up the RAM can be valuable. Over the years that frontswap was in the review process, several reviewers (and notably Hugh Dickins in 2010) pointed out that this would result, at least temporarily, in two copies of the data in RAM: one (compressed for zcache) copy in tmem, and one copy in the swap cache. We wondered if this could be done differently, at least optionally. This patch allows tmem backends to instruct the frontswap code that this backend performs exclusive gets. Zcache2 already contains hooks to support this feature. Other backends are completely unaffected unless/until they are updated to support this feature. While it is not clear that exclusive gets are a performance win on all workloads at all times, this small patch allows for experimentation by backends. P.S. Let's not quibble about the naming of "get" vs "read" vs "load" etc. The naming is currently horribly inconsistent between cleancache and frontswap and existing tmem backends, so will need to be straightened out as a separate patch. "Get" is used by the tmem architecture spec, existing backends, and all documentation and presentation material so I am using it in this patch. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-21mm: frontswap: fix a wrong if condition in frontswap_shrinkZhenzhong Duan
pages_to_unuse is set to 0 to unuse all frontswap pages But that doesn't happen since a wrong condition in frontswap_shrink cancel it. -v2: Add comment to explain return value of __frontswap_shrink, as suggested by Dan Carpenter, thanks Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-13mm/frontswap: fix uninit'ed variable warningSeth Jennings
Fixes uninitialized variable warning on 'type' in frontswap_shrink(). type is set before use by __frontswap_unuse_pages() called by __frontswap_shrink() called by frontswap_shrink() before use by try_to_unuse(). Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-23mm/frontswap: cleanup doc and comment errorWanpeng Li
Signed-off-by: Wanpeng Li <liwp.linux@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-23mm: frontswap: remove unneeded headersSasha Levin
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> [v1: Rebased with tracing removed] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19mm: frontswap: split out function to clear a page outSasha Levin
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: remove unnecessary check during initializationSasha Levin
The check whether frontswap is enabled or not is done in the API functions in the frontswap header, before they are passed to the internal double-underscored frontswap functions. Remove the check from __frontswap_init for consistency. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: make all branches of if statement in put page consistentSasha Levin
Currently it has a complex structure where different things are compared at each branch. Simplify that and make both branches look similar. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: split frontswap_shrink further to simplify lockingSasha Levin
Split frontswap_shrink to simplify the locking in the original code. Also, assert that the function that was split still runs under the swap spinlock. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: split out __frontswap_unuse_pagesSasha Levin
An attempt at making frontswap_shrink shorter and more readable. This patch splits out walking through the swap list to find an entry with enough pages to unuse. Also, assert that the internal __frontswap_unuse_pages is called under swap lock, since that part of code was previously directly happen inside the lock. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: split out __frontswap_curr_pagesSasha Levin
Code was duplicated in two functions, clean it up. Also, assert that the deduplicated code runs under the swap spinlock. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: trivial coding convention issuesSasha Levin
Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-11mm: frontswap: remove casting from function calls through ops structureSasha Levin
Removes unneeded casts. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-15frontswap: s/put_page/store/g s/get_page/loadKonrad Rzeszutek Wilk
Sounds so much more natural. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-15mm: frontswap: core frontswap functionalityDan Magenheimer
This patch, 3of4, provides the core frontswap code that interfaces between the hooks in the swap subsystem and a frontswap backend via frontswap_ops. --- New file added: mm/frontswap.c [v14: add support for writethrough, per suggestion by aarcange@redhat.com] [v11: sjenning@linux.vnet.ibm.com: s/puts/failed_puts/] [v10: sjenning@linux.vnet.ibm.com: fix debugfs calls on 32-bit] [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 1] [v9: akpm@linux-foundation.org: mark some statics __read_mostly] [v9: akpm@linux-foundation.org: add clarifying comments] [v9: akpm@linux-foundation.org: no need to loop repeating try_to_unuse] [v9: error27@gmail.com: remove superfluous check for NULL] [v8: rebase to 3.0-rc4] [v8: kamezawa.hiroyu@jp.fujitsu.com: add comment to clarify find_next_to_unuse] [v7: rebase to 3.0-rc3] [v7: JBeulich@novell.com: use new static inlines, no-ops if not config'd] [v6: rebase to 3.1-rc1] [v6: lliubbo@gmail.com: use vzalloc] [v6: lliubbo@gmail.com: fix null pointer deref if vzalloc fails] [v6: konrad.wilk@oracl.com: various checks and code clarifications/comments] [v4: rebase to 2.6.39] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Acked-by: Jan Beulich <JBeulich@novell.com> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Rik Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> [v12: Squashed s/flush/invalidate/ in] [v15: A bit of cleanup and seperate DEBUGFS] Signed-off-by: Konrad Wilk <konrad.wilk@oracle.com>