aboutsummaryrefslogtreecommitdiff
path: root/include/linux/fs.h
AgeCommit message (Collapse)Author
2022-01-22mm: remove cleancacheChristoph Hellwig
Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit 814bbf49dcd0 ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: move namei sysctls to its own fileLuis Chamberlain
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move namei's own sysctl knobs to its own file. Other than the move we also avoid initializing two static variables to 0 as this is not needed: * sysctl_protected_symlinks * sysctl_protected_hardlinks Link: https://lkml.kernel.org/r/20211129205548.605569-8-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: move locking sysctls where they are usedLuis Chamberlain
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. The locking fs sysctls are only used on fs/locks.c, so move them there. Link: https://lkml.kernel.org/r/20211129205548.605569-7-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: move dcache sysctls to its own fileLuis Chamberlain
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the dcache sysctl clutter out of kernel/sysctl.c. This is a small one-off entry, perhaps later we can simplify this representation, but for now we use the helpers we have. We won't know how we can simplify this further untl we're fully done with the cleanup. [arnd@arndb.de: avoid unused-function warning] Link: https://lkml.kernel.org/r/20211203190123.874239-2-arnd@kernel.org Link: https://lkml.kernel.org/r/20211129205548.605569-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: move fs stat sysctls to file_table.cLuis Chamberlain
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. We can create the sysctl dynamically on early init for fs stat to help with this clutter. This dusts off the fs stat syctls knobs and puts them into where they are declared. Link: https://lkml.kernel.org/r/20211129205548.605569-3-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: move inode sysctls to its own fileLuis Chamberlain
Patch series "sysctl: 4th set of kernel/sysctl cleanups". This is slimming down the fs uses of kernel/sysctl.c to the point that the next step is to just get rid of the fs base directory for it and move that elsehwere, so that next patch series starts dealing with that to demo how we can end up cleaning up a full base directory from kernel/sysctl.c, one at a time. This patch (of 9): kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the inode sysctls to its own file. Since we are no longer using this outside of fs/ remove the extern declaration of its respective proc helper. We use early_initcall() as it is the earliest we can use. [arnd@arndb.de: avoid unused-variable warning] Link: https://lkml.kernel.org/r/20211203190123.874239-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20211129205548.605569-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211129205548.605569-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Antti Palosaari <crope@iki.fi> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-16Merge tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Bruce has announced he is leaving Red Hat at the end of the month and is stepping back from his role as NFSD co-maintainer. As a result, this includes a patch removing him from the MAINTAINERS file. There is one patch in here that Jeff Layton was carrying in the locks tree. Since he had only one for this cycle, he asked us to send it to you via the nfsd tree. There continues to be 0-day reports from Robert Morris @MIT. This time we include a fix for a crash in the COPY_NOTIFY operation. Highlights: - Bruce steps down as NFSD maintainer - Prepare for dynamic nfsd thread management - More work on supporting re-exporting NFS mounts - One fs/locks patch on behalf of Jeff Layton Notable bug fixes: - Fix zero-length NFSv3 WRITEs - Fix directory cinfo on FS's that do not support iversion - Fix WRITE verifiers for stable writes - Fix crash on COPY_NOTIFY with a special state ID" * tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (51 commits) SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypes nfsd: fix crash on COPY_NOTIFY with special stateid MAINTAINERS: remove bfields NFSD: Move fill_pre_wcc() and fill_post_wcc() Revert "nfsd: skip some unnecessary stats in the v4 case" NFSD: Trace boot verifier resets NFSD: Rename boot verifier functions NFSD: Clean up the nfsd_net::nfssvc_boot field NFSD: Write verifier might go backwards nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id) NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id) NFSD: Clean up nfsd_vfs_write() nfsd: Replace use of rwsem with errseq_t NFSD: Fix verifier returned in stable WRITEs nfsd: Retry once in nfsd_open on an -EOPENSTALE return nfsd: Add errno mapping for EREMOTEIO nfsd: map EBADF ...
2022-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...
2022-01-15mm,fs: split dump_mapping() out from dump_page()Matthew Wilcox (Oracle)
dump_mapping() is a big chunk of dump_page(), and it'd be handy to be able to call it when we don't have a struct page. Split it out and move it to fs/inode.c. Take the opportunity to simplify some of the debug messages a little. Link: https://lkml.kernel.org/r/20211121121056.2870061-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-12Merge tag 'fscache-rewrite-20220111' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache rewrite from David Howells: "This is a set of patches that rewrites the fscache driver and the cachefiles driver, significantly simplifying the code compared to what's upstream, removing the complex operation scheduling and object state machine in favour of something much smaller and simpler. The series is structured such that the first few patches disable fscache use by the network filesystems using it, remove the cachefiles driver entirely and as much of the fscache driver as can be got away with without causing build failures in the network filesystems. The patches after that recreate fscache and then cachefiles, attempting to add the pieces in a logical order. Finally, the filesystems are reenabled and then the very last patch changes the documentation. [!] Note: I have dropped the cifs patch for the moment, leaving local caching in cifs disabled. I've been having trouble getting that working. I think I have it done, but it needs more testing (there seem to be some test failures occurring with v5.16 also from xfstests), so I propose deferring that patch to the end of the merge window. WHY REWRITE? ============ Fscache's operation scheduling API was intended to handle sequencing of cache operations, which were all required (where possible) to run asynchronously in parallel with the operations being done by the network filesystem, whilst allowing the cache to be brought online and offline and to interrupt service for invalidation. With the advent of the tmpfile capacity in the VFS, however, an opportunity arises to do invalidation much more simply, without having to wait for I/O that's actually in progress: Cachefiles can simply create a tmpfile, cut over the file pointer for the backing object attached to a cookie and abandon the in-progress I/O, dismissing it upon completion. Future work here would involve using Omar Sandoval's vfs_link() with AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard link from a tmpfile as currently I have to unlink the old file first. These patches can also simplify the object state handling as I/O operations to the cache don't all have to be brought to a stop in order to invalidate a file. To that end, and with an eye on to writing a new backing cache model in the future, I've taken the opportunity to simplify the indexing structure. I've separated the index cookie concept from the file cookie concept by C type now. The former is now called a "volume cookie" (struct fscache_volume) and there is a container of file cookies. There are then just the two levels. All the index cookie levels are collapsed into a single volume cookie, and this has a single printable string as a key. For instance, an AFS volume would have a key of something like "afs,example.com,1000555", combining the filesystem name, cell name and volume ID. This is freeform, but must not have '/' chars in it. I've also eliminated all pointers back from fscache into the network filesystem. This required the duplication of a little bit of data in the cookie (cookie key, coherency data and file size), but it's not actually that much. This gets rid of problems with making sure we keep netfs data structures around so that the cache can access them. These patches mean that most of the code that was in the drivers before is simply gone and those drivers are now almost entirely new code. That being the case, there doesn't seem any particular reason to try and maintain bisectability across it. Further, there has to be a point in the middle where things are cut over as there's a single point everything has to go through (ie. /dev/cachefiles) and it can't be in use by two drivers at once. ISSUES YET OUTSTANDING ====================== There are some issues still outstanding, unaddressed by this patchset, that will need fixing in future patchsets, but that don't stop this series from being usable: (1) The cachefiles driver needs to stop using the backing filesystem's metadata to store information about what parts of the cache are populated. This is not reliable with modern extent-based filesystems. Fixing this is deferred to a separate patchset as it involves negotiation with the network filesystem and the VM as to how much data to download to fulfil a read - which brings me on to (2)... (2) NFS (and CIFS with the dropped patch) do not take account of how the cache would like I/O to be structured to meet its granularity requirements. Previously, the cache used page granularity, which was fine as the network filesystems also dealt in page granularity, and the backing filesystem (ext4, xfs or whatever) did whatever it did out of sight. However, we now have folios to deal with and the cache will now have to store its own metadata to track its contents. The change I'm looking at making for cachefiles is to store content bitmaps in one or more xattrs and making a bit in the map correspond to something like a 256KiB block. However, the size of an xattr and the fact that they have to be read/updated in one go means that I'm looking at covering 1GiB of data per 512-byte map and storing each map in an xattr. Cachefiles has the potential to grow into a fully fledged filesystem of its very own if I'm not careful. However, I'm also looking at changing things even more radically and going to a different model of how the cache is arranged and managed - one that's more akin to the way, say, openafs does things - which brings me on to (3)... (3) The way cachefilesd does culling is very inefficient for large caches and it would be better to move it into the kernel if I can as cachefilesd has to keep asking the kernel if it can cull a file. Changing the way the backend works would allow this to be addressed. BITS THAT MAY BE CONTROVERSIAL ============================== There are some bits I've added that may be controversial: (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if a files is already being used by some other kernel service (e.g. a duplicate cachefiles cache in the same directory) and reject it if it is. This isn't entirely necessary, but it helps prevent accidental data corruption. I don't want to use S_SWAPFILE as that has other effects, but quite possibly swapon() should set S_KERNEL_FILE too. Note that it doesn't prevent userspace from interfering, though perhaps it should. (I have made it prevent a marked directory from being rmdir-able). (2) Cachefiles wants to keep the backing file for a cookie open whilst we might need to write to it from network filesystem writeback. The problem is that the network filesystem unuses its cookie when its file is closed, and so we have nothing pinning the cachefiles file open and it will get closed automatically after a short time to avoid EMFILE/ENFILE problems. Reopening the cache file, however, is a problem if this is being done due to writeback triggered by exit(). Some filesystems will oops if we try to open a file in that context because they want to access current->fs or suchlike. To get around this, I added the following: (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network filesystem inode to indicate that we have a usage count on the cookie caching that inode. (B) A flag in struct writeback_control, unpinned_fscache_wb, that is set when __writeback_single_inode() clears the last dirty page from i_pages - at which point it clears I_PINNING_FSCACHE_WB and sets this flag. This has to be done here so that clearing I_PINNING_FSCACHE_WB can be done atomically with the check of PAGECACHE_TAG_DIRTY that clears I_DIRTY_PAGES. (C) A function, fscache_set_page_dirty(), which if it is not set, sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the cache resources. (D) A function, fscache_unpin_writeback(), to be called by ->write_inode() to unuse the cookie. (E) A function, fscache_clear_inode_writeback(), to be called when the inode is evicted, before clear_inode() is called. This cleans up any lingering I_PINNING_FSCACHE_WB. The network filesystem can then use these tools to make sure that fscache_write_to_cache() can write locally modified data to the cache as well as to the server. For the future, I'm working on write helpers for netfs lib that should allow this facility to be removed by keeping track of the dirty regions separately - but that's incomplete at the moment and is also going to be affected by folios, one way or another, since it deals with pages" Link: https://lore.kernel.org/all/510611.1641942444@warthog.procyon.org.uk/ Tested-by: Dominique Martinet <asmadeus@codewreck.org> # 9p Tested-by: kafs-testing@auristor.com # afs Tested-by: Jeff Layton <jlayton@kernel.org> # ceph Tested-by: Dave Wysochanski <dwysocha@redhat.com> # nfs Tested-by: Daire Byrne <daire@dneg.com> # nfs * tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (67 commits) 9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking() fscache: Add a tracepoint for cookie use/unuse fscache: Rewrite documentation ceph: add fscache writeback support ceph: conversion to new fscache API nfs: Implement cache I/O by accessing the cache directly nfs: Convert to new fscache volume/cookie API 9p: Copy local writes to the cache when writing to the server 9p: Use fscache indexing rewrite and reenable caching afs: Skip truncation on the server of data we haven't written yet afs: Copy local writes to the cache when writing to the server afs: Convert afs to use the new fscache API fscache, cachefiles: Display stat of culling events fscache, cachefiles: Display stats of no-space events cachefiles: Allow cachefiles to actually function fscache, cachefiles: Store the volume coherency data cachefiles: Implement the I/O routines cachefiles: Implement cookie resize for truncate cachefiles: Implement begin and end I/O operation cachefiles: Implement backing file wrangling ...
2022-01-12Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...
2022-01-09fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypesArnd Bergmann
My patch to rework oabi fcntl64() introduced a harmless sparse warning when file locking is disabled: arch/arm/kernel/sys_oabi-compat.c:251:51: sparse: sparse: incorrect type in argument 3 (different address spaces) @@ expected struct flock64 [noderef] __user *user @@ got struct flock64 * @@ arch/arm/kernel/sys_oabi-compat.c:251:51: sparse: expected struct flock64 [noderef] __user *user arch/arm/kernel/sys_oabi-compat.c:251:51: sparse: got struct flock64 * arch/arm/kernel/sys_oabi-compat.c:265:55: sparse: sparse: incorrect type in argument 4 (different address spaces) @@ expected struct flock64 [noderef] __user *user @@ got struct flock64 * @@ arch/arm/kernel/sys_oabi-compat.c:265:55: sparse: expected struct flock64 [noderef] __user *user arch/arm/kernel/sys_oabi-compat.c:265:55: sparse: got struct flock64 * When file locking is enabled, everything works correctly and the right data gets passed, but the stub declarations in linux/fs.h did not get modified when the calling conventions changed in an earlier patch. Reported-by: kernel test robot <lkp@intel.com> Fixes: 7e2d8c29ecdd ("ARM: 9111/1: oabi-compat: rework fcntl64() emulation") Fixes: a75d30c77207 ("fs/locks: pass kernel struct flock to fcntl_getlk/setlk") Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-07vfs, cachefiles: Mark a backing file in use with an inode flagDavid Howells
Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by the kernel to prevent cachefiles or other kernel services from interfering with that file. Alter rmdir to reject attempts to remove a directory marked with this flag. This is used by cachefiles to prevent cachefilesd from removing them. Using S_SWAPFILE instead isn't really viable as that has other effects in the I/O paths. Changes ======= ver #3: - Check for the object pointer being NULL in the tracepoints rather than the caller. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021541207.640689.564689725898537127.stgit@warthog.procyon.org.uk/ # v4
2022-01-07vfs, fscache: Implement pinning of cache usage for writebackDavid Howells
Cachefiles has a problem in that it needs to keep the backing file for a cookie open whilst there are local modifications pending that need to be written to it. However, we don't want to keep the file open indefinitely, as that causes EMFILE/ENFILE/ENOMEM problems. Reopening the cache file, however, is a problem if this is being done due to writeback triggered by exit(). Some filesystems will oops if we try to open a file in that context because they want to access current->fs or other resources that have already been dismantled. To get around this, I added the following: (1) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network filesystem inode to indicate that we have a usage count on the cookie caching that inode. (2) A flag in struct writeback_control, unpinned_fscache_wb, that is set when __writeback_single_inode() clears the last dirty page from i_pages - at which point it clears I_PINNING_FSCACHE_WB and sets this flag. This has to be done here so that clearing I_PINNING_FSCACHE_WB can be done atomically with the check of PAGECACHE_TAG_DIRTY that clears I_DIRTY_PAGES. (3) A function, fscache_set_page_dirty(), which if it is not set, sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the cache resources. (4) A function, fscache_unpin_writeback(), to be called by ->write_inode() to unuse the cookie. (5) A function, fscache_clear_inode_writeback(), to be called when the inode is evicted, before clear_inode() is called. This cleans up any lingering I_PINNING_FSCACHE_WB. The network filesystem can then use these tools to make sure that fscache_write_to_cache() can write locally modified data to the cache as well as to the server. For the future, I'm working on write helpers for netfs lib that should allow this facility to be removed by keeping track of the dirty regions separately - but that's incomplete at the moment and is also going to be affected by folios, one way or another, since it deals with pages Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819615157.215744.17623791756928043114.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906917856.143852.8224898306177154573.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967124567.1823006.14188359004568060298.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021524705.640689.17824932021727663017.stgit@warthog.procyon.org.uk/ # v4
2021-12-05fs: support mapped mounts of mapped filesystemsChristian Brauner
In previous patches we added new and modified existing helpers to handle idmapped mounts of filesystems mounted with an idmapping. In this final patch we convert all relevant places in the vfs to actually pass the filesystem's idmapping into these helpers. With this the vfs is in shape to handle idmapped mounts of filesystems mounted with an idmapping. Note that this is just the generic infrastructure. Actually adding support for idmapped mounts to a filesystem mountable with an idmapping is follow-up work. In this patch we extend the definition of an idmapped mount from a mount that that has the initial idmapping attached to it to a mount that has an idmapping attached to it which is not the same as the idmapping the filesystem was mounted with. As before we do not allow the initial idmapping to be attached to a mount. In addition this patch prevents that the idmapping the filesystem was mounted with can be attached to a mount created based on this filesystem. This has multiple reasons and advantages. First, attaching the initial idmapping or the filesystem's idmapping doesn't make much sense as in both cases the values of the i_{g,u}id and other places where k{g,u}ids are used do not change. Second, a user that really wants to do this for whatever reason can just create a separate dedicated identical idmapping to attach to the mount. Third, we can continue to use the initial idmapping as an indicator that a mount is not idmapped allowing us to continue to keep passing the initial idmapping into the mapping helpers to tell them that something isn't an idmapped mount even if the filesystem is mounted with an idmapping. Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-05fs: add i_user_ns() helperChristian Brauner
Since we'll be passing the filesystem's idmapping in even more places in the following patches and we do already dereference struct inode to get to the filesystem's idmapping multiple times add a tiny helper. Link: https://lore.kernel.org/r/20211123114227.3124056-10-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-10-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-10-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-05fs: port higher-level mapping helpersChristian Brauner
Enable the mapped_fs{g,u}id() helpers to support filesystems mounted with an idmapping. Apart from core mapping helpers that use mapped_fs{g,u}id() to initialize struct inode's i_{g,u}id fields xfs is the only place that uses these low-level helpers directly. The patch only extends the helpers to be able to take the filesystem idmapping into account. Since we don't actually yet pass the filesystem's idmapping in no functional changes happen. This will happen in a final patch. Link: https://lore.kernel.org/r/20211123114227.3124056-9-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-9-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-03mm: move filemap_range_needs_writeback() into headerJens Axboe
No functional changes in this patch, just in preparation for efficiently calling this light function from the block O_DIRECT handling. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03fs: account for filesystem mappingsChristian Brauner
Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc4909d ("fs/mount_setattr: tighten permission checks") [2]: https://github.com/containers/podman/issues/10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e791a1 ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/20211123114227.3124056-5-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-5-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-5-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-03fs: tweak fsuidgid_has_mapping()Christian Brauner
If the caller's fs{g,u}id aren't mapped in the mount's idmapping we can return early and skip the check whether the mapped fs{g,u}id also have a mapping in the filesystem's idmapping. If the fs{g,u}id aren't mapped in the mount's idmapping they consequently can't be mapped in the filesystem's idmapping. So there's no point in checking that. Link: https://lore.kernel.org/r/20211123114227.3124056-4-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-4-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-4-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-03fs: move mapping helpersChristian Brauner
The low-level mapping helpers were so far crammed into fs.h. They are out of place there. The fs.h header should just contain the higher-level mapping helpers that interact directly with vfs objects such as struct super_block or struct inode and not the bare mapping helpers. Similarly, only vfs and specific fs code shall interact with low-level mapping helpers. And so they won't be made accessible automatically through regular {g,u}id helpers. Link: https://lore.kernel.org/r/20211123114227.3124056-3-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-3-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-3-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-03fs: add is_idmapped_mnt() helperChristian Brauner
Multiple places open-code the same check to determine whether a given mount is idmapped. Introduce a simple helper function that can be used instead. This allows us to get rid of the fragile open-coding. We will later change the check that is used to determine whether a given mount is idmapped. Introducing a helper allows us to do this in a single place instead of doing it for multiple places. Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-11-17fs: Remove FS_THP_SUPPORTMatthew Wilcox (Oracle)
Instead of setting a bit in the fs_flags to set a bit in the address_space, set the bit in the address_space directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-11-11Merge tag 'net-5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, can and netfilter. Current release - regressions: - bpf: do not reject when the stack read size is different from the tracked scalar size - net: fix premature exit from NAPI state polling in napi_disable() - riscv, bpf: fix RV32 broken build, and silence RV64 warning Current release - new code bugs: - net: fix possible NULL deref in sock_reserve_memory - amt: fix error return code in amt_init(); fix stopping the workqueue - ax88796c: use the correct ioctl callback Previous releases - always broken: - bpf: stop caching subprog index in the bpf_pseudo_func insn - security: fixups for the security hooks in sctp - nfc: add necessary privilege flags in netlink layer, limit operations to admin only - vsock: prevent unnecessary refcnt inc for non-blocking connect - net/smc: fix sk_refcnt underflow on link down and fallback - nfnetlink_queue: fix OOB when mac header was cleared - can: j1939: ignore invalid messages per standard - bpf, sockmap: - fix race in ingress receive verdict with redirect to self - fix incorrect sk_skb data_end access when src_reg = dst_reg - strparser, and tls are reusing qdisc_skb_cb and colliding - ethtool: fix ethtool msg len calculation for pause stats - vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to access an unregistering real_dev - udp6: make encap_rcv() bump the v6 not v4 stats - drv: prestera: add explicit padding to fix m68k build - drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge - drv: mvpp2: fix wrong SerDes reconfiguration order Misc & small latecomers: - ipvs: auto-load ipvs on genl access - mctp: sanity check the struct sockaddr_mctp padding fields - libfs: support RENAME_EXCHANGE in simple_rename() - avoid double accounting for pure zerocopy skbs" * tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits) selftests/net: udpgso_bench_rx: fix port argument net: wwan: iosm: fix compilation warning cxgb4: fix eeprom len when diagnostics not implemented net: fix premature exit from NAPI state polling in napi_disable() net/smc: fix sk_refcnt underflow on linkdown and fallback net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer() gve: fix unmatched u64_stats_update_end() net: ethernet: lantiq_etop: Fix compilation error selftests: forwarding: Fix packet matching in mirroring selftests vsock: prevent unnecessary refcnt inc for nonblocking connect net: marvell: mvpp2: Fix wrong SerDes reconfiguration order net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory net: stmmac: allow a tc-taprio base-time of zero selftests: net: test_vxlan_under_vrf: fix HV connectivity test net: hns3: allow configure ETS bandwidth of all TCs net: hns3: remove check VF uc mac exist when set by PF net: hns3: fix some mac statistics is always 0 in device version V2 net: hns3: fix kernel crash when unload VF while it is being reset net: hns3: sync rx ring head in echo common pull net: hns3: fix pfc packet number incorrect after querying pfc parameters ...
2021-11-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "87 patches. Subsystems affected by this patch series: mm (pagecache and hugetlb), procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs, init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork, sysvfs, kcov, gdb, resource, selftests, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits) ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL ipc: check checkpoint_restore_ns_capable() to modify C/R proc files selftests/kselftest/runner/run_one(): allow running non-executable files virtio-mem: disallow mapping virtio-mem memory via /dev/mem kernel/resource: disallow access to exclusive system RAM regions kernel/resource: clean up and optimize iomem_is_exclusive() scripts/gdb: handle split debug for vmlinux kcov: replace local_irq_save() with a local_lock_t kcov: avoid enable+disable interrupts if !in_task() kcov: allocate per-CPU memory on the relevant node Documentation/kcov: define `ip' in the example Documentation/kcov: include types.h in the example sysv: use BUILD_BUG_ON instead of runtime check kernel/fork.c: unshare(): use swap() to make code cleaner seq_file: fix passing wrong private data seq_file: move seq_escape() to a header signal: remove duplicate include in signal.h crash_dump: remove duplicate include in crash_dump.h crash_dump: fix boolreturn.cocci warning hfs/hfsplus: use WARN_ON for sanity check ...
2021-11-09vfs: keep inodes with page cache off the inode shrinker LRUJohannes Weiner
Historically (pre-2.5), the inode shrinker used to reclaim only empty inodes and skip over those that still contained page cache. This caused problems on highmem hosts: struct inode could put fill lowmem zones before the cache was getting reclaimed in the highmem zones. To address this, the inode shrinker started to strip page cache to facilitate reclaiming lowmem. However, this comes with its own set of problems: the shrinkers may drop actively used page cache just because the inodes are not currently open or dirty - think working with a large git tree. It further doesn't respect cgroup memory protection settings and can cause priority inversions between containers. Nowadays, the page cache also holds non-resident info for evicted cache pages in order to detect refaults. We've come to rely heavily on this data inside reclaim for protecting the cache workingset and driving swap behavior. We also use it to quantify and report workload health through psi. The latter in turn is used for fleet health monitoring, as well as driving automated memory sizing of workloads and containers, proactive reclaim and memory offloading schemes. The consequences of dropping page cache prematurely is that we're seeing subtle and not-so-subtle failures in all of the above-mentioned scenarios, with the workload generally entering unexpected thrashing states while losing the ability to reliably detect it. To fix this on non-highmem systems at least, going back to rotating inodes on the LRU isn't feasible. We've tried (commit a76cf1a474d7 ("mm: don't reclaim inodes with many attached pages")) and failed (commit 69056ee6a8a3 ("Revert "mm: don't reclaim inodes with many attached pages"")). The issue is mostly that shrinker pools attract pressure based on their size, and when objects get skipped the shrinkers remember this as deferred reclaim work. This accumulates excessive pressure on the remaining inodes, and we can quickly eat into heavily used ones, or dirty ones that require IO to reclaim, when there potentially is plenty of cold, clean cache around still. Instead, this patch keeps populated inodes off the inode LRU in the first place - just like an open file or dirty state would. An otherwise clean and unused inode then gets queued when the last cache entry disappears. This solves the problem without reintroducing the reclaim issues, and generally is a bit more scalable than having to wade through potentially hundreds of thousands of busy inodes. Locking is a bit tricky because the locks protecting the inode state (i_lock) and the inode LRU (lru_list.lock) don't nest inside the irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are serialized through i_lock, taken before the i_pages lock, to make sure depopulated inodes are queued reliably. Additions may race with deletions, but we'll check again in the shrinker. If additions race with the shrinker itself, we're protected by the i_lock: if find_inode() or iput() win, the shrinker will bail on the elevated i_count or I_REFERENCED; if the shrinker wins and goes ahead with the inode, it will set I_FREEING and inhibit further igets(), which will cause the other side to create a new instance of the inode instead. Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...
2021-11-06fs: explicitly unregister per-superblock BDIsChristoph Hellwig
Add a new SB_I_ flag to mark superblocks that have an ephemeral bdi associated with them, and unregister it when the superblock is shut down. Link: https://lkml.kernel.org/r/20211021124441.668816-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-03libfs: Move shmem_exchange to simple_rename_exchangeLorenz Bauer
Move shmem_exchange and make it available to other callers. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20211028094724.59043-2-lmb@cloudflare.com
2021-11-01Merge tag 'for-5.16-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The updates this time are more under the hood and enhancing existing features (subpage with compression and zoned namespaces). Performance related: - misc small inode logging improvements (+3% throughput, -11% latency on sample dbench workload) - more efficient directory logging: bulk item insertion, less tree searches and locking - speed up bulk insertion of items into a b-tree, which is used when logging directories, when running delayed items for directories (fsync and transaction commits) and when running the slow path (full sync) of an fsync (bulk creation run time -4%, deletion -12%) Core: - continued subpage support - make defragmentation work - make compression write work - zoned mode - support ZNS (zoned namespaces), zone capacity is number of usable blocks in each zone - add dedicated block group (zoned) for relocation, to prevent out of order writes in some cases - greedy block group reclaim, pick the ones with least usable space first - preparatory work for send protocol updates - error handling improvements - cleanups and refactoring Fixes: - lockdep warnings - in show_devname callback, on seeding device - device delete on loop device due to conversions to workqueues - fix deadlock between chunk allocation and chunk btree modifications - fix tracking of missing device count and status" * tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (140 commits) btrfs: remove root argument from check_item_in_log() btrfs: remove root argument from add_link() btrfs: remove root argument from btrfs_unlink_inode() btrfs: remove root argument from drop_one_dir_item() btrfs: clear MISSING device status bit in btrfs_close_one_device btrfs: call btrfs_check_rw_degradable only if there is a missing device btrfs: send: prepare for v2 protocol btrfs: fix comment about sector sizes supported in 64K systems btrfs: update device path inode time instead of bd_inode fs: export an inode_update_time helper btrfs: fix deadlock when defragging transparent huge pages btrfs: sysfs: convert scnprintf and snprintf to sysfs_emit btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZE btrfs: update comments for chunk allocation -ENOSPC cases btrfs: fix deadlock between chunk allocation and chunk btree modifications btrfs: zoned: use greedy gc for auto reclaim btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls btrfs: add a btrfs_get_dev_args_from_path helper btrfs: handle device lookup with btrfs_dev_lookup_args ...
2021-10-26fs: export an inode_update_time helperJosef Bacik
If you already have an inode and need to update the time on the inode there is no way to do this properly. Export this helper to allow file systems to update time on the inode so the appropriate handler is called, either ->update_time or generic_update_time. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25fs: get rid of the res2 iocb->ki_complete argumentJens Axboe
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: replace the spin argument to blk_iopoll with a flags argumentChristoph Hellwig
Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-09Merge branch 'work.gfs2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull gfs2 setattr updates from Al Viro: "Make it possible for filesystems to use a generic 'may_setattr()' and switch gfs2 to using it" * 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gfs2: Switch to may_setattr in gfs2_setattr fs: Move notify_change permission checks into may_setattr
2021-09-09Merge branch 'work.init' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull root filesystem type handling updates from Al Viro: "Teach init/do_mounts.c to handle non-block filesystems, hopefully preventing even more special-cased kludges (such as root=/dev/nfs, etc)" * 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: simplify get_filesystem_list / get_all_fs_names init: allow mounting arbitrary non-blockdevice filesystems as root init: split get_fs_names
2021-09-04Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linuxLinus Torvalds
Pull MAP_DENYWRITE removal from David Hildenbrand: "Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove VM_DENYWRITE. There are some (minor) user-visible changes: - We no longer deny write access to shared libaries loaded via legacy uselib(); this behavior matches modern user space e.g. dlopen(). - We no longer deny write access to the elf interpreter after exec completed, treating it just like shared libraries (which it often is). - We always deny write access to the file linked via /proc/pid/exe: sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the file cannot be denied, and write access to the file will remain denied until the link is effectivel gone (exec, termination, sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file. Cross-compiled for a bunch of architectures (alpha, microblaze, i386, s390x, ...) and verified via ltp that especially the relevant tests (i.e., creat07 and execve04) continue working as expected" * tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux: fs: update documentation of get_write_access() and friends mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() mm: remove VM_DENYWRITE binfmt: remove in-tree usage of MAP_DENYWRITE kernel/fork: always deny write access to current MM exe_file kernel/fork: factor out replacing the current MM exe_file binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
2021-09-03fs: update documentation of get_write_access() and friendsDavid Hildenbrand
As VM_DENYWRITE does no longer exists, let's spring-clean the documentation of get_write_access() and friends. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2021-09-02Merge tag 'ovl-update-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Copy up immutable/append/sync/noatime attributes (Amir Goldstein) - Improve performance by enabling RCU lookup. - Misc fixes and improvements The reason this touches so many files is that the ->get_acl() method now gets a "bool rcu" argument. The ->get_acl() API was updated based on comments from Al and Linus: Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/ * tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: enable RCU'd ->get_acl() vfs: add rcu argument to ->get_acl() callback ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup() ovl: use kvalloc in xattr copy-up ovl: update ctime when changing fileattr ovl: skip checking lower file's i_writecount on truncate ovl: relax lookup error on mismatch origin ftype ovl: do not set overlay.opaque for new directories ovl: add ovl_allow_offline_changes() helper ovl: disable decoding null uuid with redirect_dir ovl: consistent behavior for immutable/append-only inodes ovl: copy up sync/noatime fileattr flags ovl: pass ovl_fs to ovl_check_setxattr() fs: add generic helper for filling statx attribute flags
2021-08-31Merge tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "New features: - Support for server-side disconnect injection via debugfs - Protocol definitions for new RPC_AUTH_TLS authentication flavor Performance improvements: - Reduce page allocator traffic in the NFSD splice read actor - Reduce CPU utilization in svcrdma's Send completion handler Notable bug fixes: - Stabilize lockd operation when re-exporting NFS mounts - Fix the use of %.*s in NFSD tracepoints - Fix /proc/sys/fs/nfs/nsm_use_hostnames" * tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) nfsd: fix crash on LOCKT on reexported NFSv3 nfs: don't allow reexport reclaims lockd: don't attempt blocking locks on nfs reexports nfs: don't atempt blocking locks on nfs reexports Keep read and write fds with each nlm_file lockd: update nlm_lookup_file reexport comment nlm: minor refactoring nlm: minor nlm_lookup_file argument change lockd: lockd server-side shouldn't set fl_ops SUNRPC: Add documentation for the fail_sunrpc/ directory SUNRPC: Server-side disconnect injection SUNRPC: Move client-side disconnect injection SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() nfsd4: Fix forced-expiry locking rpc: fix gss_svc_init cleanup on failure SUNRPC: Add RPC_AUTH_TLS protocol numbers lockd: change the proc_handler for nsm_use_hostnames sysctl: introduce new proc handler proc_dobool SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() ...
2021-08-31Merge tag 'for-5.15-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The highlights of this round are integrations with fs-verity and idmapped mounts, the rest is usual mix of minor improvements, speedups and cleanups. There are some patches outside of btrfs, namely updating some VFS interfaces, all straightforward and acked. Features: - fs-verity support, using standard ioctls, backward compatible with read-only limitation on inodes with previously enabled fs-verity - idmapped mount support - make mount with rescue=ibadroots more tolerant to partially damaged trees - allow raid0 on a single device and raid10 on two devices, degenerate cases but might be useful as an intermediate step during conversion to other profiles - zoned mode block group auto reclaim can be disabled via sysfs knob Performance improvements: - continue readahead of node siblings even if target node is in memory, could speed up full send (on sample test +11%) - batching of delayed items can speed up creating many files - fsync/tree-log speedups - avoid unnecessary work (gains +2% throughput, -2% run time on sample load) - reduced lock contention on renames (on dbench +4% throughput, up to -30% latency) Fixes: - various zoned mode fixes - preemptive flushing threshold tuning, avoid excessive work on almost full filesystems Core: - continued subpage support, preparation for implementing remaining features like compression and defragmentation; with some limitations, write is now enabled on 64K page systems with 4K sectors, still considered experimental - no readahead on compressed reads - inline extents disabled - disabled raid56 profile conversion and mount - improved flushing logic, fixing early ENOSPC on some workloads - inode flags have been internally split to read-only and read-write incompat bit parts, used by fs-verity - new tree items for fs-verity - descriptor item - Merkle tree item - inode operations extended to be namespace-aware - cleanups and refactoring Generic code changes: - fs: new export filemap_fdatawrite_wbc - fs: removed sync_inode - block: bio_trim argument type fixups - vfs: add namespace-aware lookup" * tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits) btrfs: reset replace target device to allocation state on close btrfs: zoned: fix ordered extent boundary calculation btrfs: do not do preemptive flushing if the majority is global rsv btrfs: reduce the preemptive flushing threshold to 90% btrfs: tree-log: check btrfs_lookup_data_extent return value btrfs: avoid unnecessarily logging directories that had no changes btrfs: allow idmapped mount btrfs: handle ACLs on idmapped mounts btrfs: allow idmapped INO_LOOKUP_USER ioctl btrfs: allow idmapped SUBVOL_SETFLAGS ioctl btrfs: allow idmapped SET_RECEIVED_SUBVOL ioctls btrfs: relax restrictions for SNAP_DESTROY_V2 with subvolids btrfs: allow idmapped SNAP_DESTROY ioctls btrfs: allow idmapped SNAP_CREATE/SUBVOL_CREATE ioctls btrfs: check whether fsgid/fsuid are mapped during subvolume creation btrfs: allow idmapped permission inode op btrfs: allow idmapped setattr inode op btrfs: allow idmapped tmpfile inode op btrfs: allow idmapped symlink inode op btrfs: allow idmapped mkdir inode op ...
2021-08-30Merge tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring mkdirat/symlinkat/linkat support from Jens Axboe: "This adds io_uring support for mkdirat, symlinkat, and linkat" * tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-block: io_uring: add support for IORING_OP_LINKAT io_uring: add support for IORING_OP_SYMLINKAT io_uring: add support for IORING_OP_MKDIRAT namei: update do_*() helpers to return ints namei: make do_linkat() take struct filename namei: add getname_uflags() namei: make do_symlinkat() take struct filename namei: make do_mknodat() take struct filename namei: make do_mkdirat() take struct filename namei: change filename_parentat() calling conventions namei: ignore ERR/NULL names in putname()
2021-08-30Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull support for struct bio recycling from Jens Axboe: "This adds bio recycling support for polled IO, allowing quick reuse of a bio for high IOPS scenarios via a percpu bio_set list. It's good for almost a 10% improvement in performance, bumping our per-core IO limit from ~3.2M IOPS to ~3.5M IOPS" * tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block: bio: improve kerneldoc documentation for bio_alloc_kiocb() block: provide bio_clear_hipri() helper block: use the percpu bio cache in __blkdev_direct_IO io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
2021-08-30Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Nothing major in here - lots of good cleanups and tech debt handling, which is also evident in the diffstats. In particular: - Add disk sequence numbers (Matteo) - Discard merge fix (Ming) - Relax disk zoned reporting restrictions (Niklas) - Bio error handling zoned leak fix (Pavel) - Start of proper add_disk() error handling (Luis, Christoph) - blk crypto fix (Eric) - Non-standard GPT location support (Dmitry) - IO priority improvements and cleanups (Damien)o - blk-throtl improvements (Chunguang) - diskstats_show() stack reduction (Abd-Alrhman) - Loop scheduler selection (Bart) - Switch block layer to use kmap_local_page() (Christoph) - Remove obsolete disk_name helper (Christoph) - block_device refcounting improvements (Christoph) - Ensure gendisk always has a request queue reference (Christoph) - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)" * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits) sg: pass the device name to blk_trace_setup block, bfq: cleanup the repeated declaration blk-crypto: fix check for too-large dun_bytes blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN blk-zoned: allow zone management send operations without CAP_SYS_ADMIN block: mark blkdev_fsync static block: refine the disk_live check in del_gendisk mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA mmc: block: Support alternative_gpt_sector() operation partitions/efi: Support non-standard GPT location block: Add alternative_gpt_sector() operation bio: fix page leak bio_add_hw_page failure block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT block: remove a pointless call to MINOR() in device_add_disk null_blk: add error handling support for add_disk() virtio_blk: add error handling support for add_disk() block: add error handling for device_add_disk / add_disk block: return errors from disk_alloc_events block: return errors from blk_integrity_add block: call blk_register_queue earlier in device_add_disk ...
2021-08-30Merge tag 'locks-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This starts with a couple of fixes for potential deadlocks in the fowner/fasync handling. The next patch removes the old mandatory locking code from the kernel altogether. The last patch cleans up rw_verify_area a bit more after the mandatory locking removal" * tag 'locks-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: clean up after mandatory file locking support removal fs: remove mandatory file locking support fcntl: fix potential deadlock for &fasync_struct.fa_lock fcntl: fix potential deadlocks for &fown_struct.lock
2021-08-30Merge tag 'hole_punch_for_v5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fs hole punching vs cache filling race fixes from Jan Kara: "Fix races leading to possible data corruption or stale data exposure in multiple filesystems when hole punching races with operations such as readahead. This is the series I was sending for the last merge window but with your objection fixed - now filemap_fault() has been modified to take invalidate_lock only when we need to create new page in the page cache and / or bring it uptodate" * tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: filesystems/locking: fix Malformed table warning cifs: Fix race between hole punch and page fault ceph: Fix race between hole punch and page fault fuse: Convert to using invalidate_lock f2fs: Convert to using invalidate_lock zonefs: Convert to using invalidate_lock xfs: Convert double locking of MMAPLOCK to use VFS helpers xfs: Convert to use invalidate_lock xfs: Refactor xfs_isilocked() ext2: Convert to using invalidate_lock ext4: Convert to use mapping->invalidate_lock mm: Add functions to lock invalidate_lock for two mappings mm: Protect operations adding pages to page cache with invalidate_lock documentation: Sync file_operations members with reality mm: Fix comments mentioning i_mutex
2021-08-26nfs: don't allow reexport reclaimsJ. Bruce Fields
In the reexport case, nfsd is currently passing along locks with the reclaim bit set. The client sends a new lock request, which is granted if there's currently no conflict--even if it's possible a conflicting lock could have been briefly held in the interim. We don't currently have any way to safely grant reclaim, so for now let's just deny them all. I'm doing this by passing the reclaim bit to nfs and letting it fail the call, with the idea that eventually the client might be able to do something more forgiving here. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-24block: mark blkdev_fsync staticChristoph Hellwig
blkdev_fsync is only used inside of block_dev.c since the removal of the raw drіver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210824151823.1575100-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23fs: add kiocb alloc cache flagJens Axboe
If this kiocb can safely use the polled bio allocation cache, then this flag must be set. Generally this can be set for polled IO, where we will not see IRQ completions of the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>