aboutsummaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)Author
2022-03-31Merge tag 'gfs2-v5.17-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: - To avoid deadlocks, actively cancel dlm locking requests when we give up on them. Further dlm operations on the same lock will return -EBUSY until the cancel has been completed, so in that case, wait and repeat. (This is rare.) - Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode(). - Some more fallout from the gfs2 mmap + page fault deadlock fixes (merged in commit c03098d4b9ad7: "Merge tag 'gfs2-v5.15-rc5-mmap-fault'"). - Various other minor bug fixes and cleanups. * tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Make sure FITRIM minlen is rounded up to fs block size gfs2: Make sure not to return short direct writes gfs2: Remove dead code in gfs2_file_read_iter gfs2: Fix gfs2_file_buffered_write endless loop workaround gfs2: Minor retry logic cleanup gfs2: Disable page faults during lockless buffered reads gfs2: Fix should_fault_in_pages() logic gfs2: Remove return value for gfs2_indirect_init gfs2: Initialize gh_error in gfs2_glock_nq gfs2: Make use of list_is_first gfs2: Switch lock order of inode and iopen glock gfs2: cancel timed-out glock requests gfs2: Expect -EBUSY after canceling dlm locking requests gfs2: gfs2_setattr_size error path fix gfs2: assign rgrp glock before compute_bitstructs
2022-03-31gfs2: Make sure FITRIM minlen is rounded up to fs block sizeAndrew Price
Per fstrim(8) we must round up the minlen argument to the fs block size. The current calculation doesn't take into account devices that have a discard granularity and requested minlen less than 1 fs block, so the value can get shifted away to zero in the translation to fs blocks. The zero minlen passed to gfs2_rgrp_send_discards() then allows sb_issue_discard() to be called with nr_sects == 0 which returns -EINVAL and results in gfs2_rgrp_send_discards() returning -EIO. Make sure minlen is never < 1 fs block by taking the max of the requested minlen and the fs block size before comparing to the device's discard granularity and shifting to fs blocks. Fixes: 076f0faa764ab ("GFS2: Fix FITRIM argument handling") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-26Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull NVMe write streams removal from Jens Axboe: "This removes the write streams support in NVMe. No vendor ever really shipped working support for this, and they are not interested in supporting it. With the NVMe support gone, we have nothing in the tree that supports this. Remove passing around of the hints. The only discussion point in this patchset imho is the fact that the file specific write hint setting/getting fcntl helpers will now return -1/EINVAL like they did before we supported write hints. No known applications use these functions, I only know of one prototype that I help do for RocksDB, and that's not used. That said, with a change like this, it's always a bit controversial. Alternatively, we could just make them return 0 and pretend it worked. It's placement based hints after all" * tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block: fs: remove fs.f_write_hint fs: remove kiocb.ki_hint block: remove the per-bio/request write hint nvme: remove support or stream based temperature hint
2022-03-24gfs2: Make sure not to return short direct writesAndreas Gruenbacher
When direct writes fail with -ENOTBLK because we're writing into a hole (gfs2_iomap_begin()) or because of a page invalidation failure (iomap_dio_rw()), we're falling back to buffered writes. In that case, when we lose the inode glock in gfs2_file_buffered_write(), we want to re-acquire it instead of returning a short write. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-24gfs2: Remove dead code in gfs2_file_read_iterAndreas Gruenbacher
Function iomap_dio_rw() only returns -ENOTBLK for write requests and gfs2_file_direct_read() no longer returns -ENOTBLK since commit 1d45bb7f9d2a5 ("gfs2: Use iomap for stuffed direct I/O reads"), so there is no need to check for -ENOTBLK in gfs2_file_read_iter() anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-24gfs2: Fix gfs2_file_buffered_write endless loop workaroundAndreas Gruenbacher
Since commit 554c577cee95b, gfs2_file_buffered_write() can accidentally return a truncated iov_iter, which might confuse callers. Fix that. Fixes: 554c577cee95b ("gfs2: Prevent endless loops in gfs2_file_buffered_write") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-23gfs2: Minor retry logic cleanupAndreas Gruenbacher
Clean up the retry logic in the read and write functions somewhat. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-23gfs2: Disable page faults during lockless buffered readsAndreas Gruenbacher
During lockless buffered reads, filemap_read() holds page cache page references while trying to copy data to the user-space buffer. The calling process isn't holding the inode glock, but the page references it holds prevent those pages from being removed from the page cache, and that prevents the underlying inode glock from being moved to another node. Thus, we can end up in the same kinds of distributed deadlock situations as with normal (non-lockless) buffered reads. Fix that by disabling page faults during lockless reads as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-23gfs2: Fix should_fault_in_pages() logicAndreas Gruenbacher
Fix the fault-in window size logic: * Use a maximum window size of 1 MiB instead of BIO_MAX_VECS * PAGE_SIZE. The previous window size was always one page because the pages variable was accidentally being defined and then redefined in should_fault_in_pages(). * The nr_dirtied heuristic for guessing when there might be memory pressure often results in very small window sizes. Don't let nr_dirtied drop below 8 pages (as btrfs does). * Compute the window size in units of bytes, not pages. * Account for page overlap (unaligned iterators). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-22Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22fs: allocate inode by using alloc_inode_sb()Muchun Song
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-21Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
2022-03-16fs: Convert __set_page_dirty_buffers to block_dirty_folioMatthew Wilcox (Oracle)
Convert all callers; mostly this is just changing the aops to point at it, but a few implementations need a little more work. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folioMatthew Wilcox (Oracle)
These filesystems use __set_page_dirty_nobuffers() either directly or with a very thin wrapper; convert them en masse. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15gfs2: Convert invalidatepage to invalidate_folioMatthew Wilcox (Oracle)
This is a straightforward conversion. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15fs: Turn block_invalidatepage into block_invalidate_folioMatthew Wilcox (Oracle)
Remove special-casing of a NULL invalidatepage, since there is no more block_invalidatepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15iomap: Remove iomap_invalidatepage()Matthew Wilcox (Oracle)
Use iomap_invalidate_folio() in all the iomap-based filesystems and rename the iomap_invalidatepage tracepoint. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-07block: remove the per-bio/request write hintChristoph Hellwig
With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07Merge branch 'for-5.18/block' into for-5.18/write-streamsJens Axboe
* for-5.18/block: (96 commits) block: remove bio_devname ext4: stop using bio_devname raid5-ppl: stop using bio_devname raid1: stop using bio_devname md-multipath: stop using bio_devname dm-integrity: stop using bio_devname dm-crypt: stop using bio_devname pktcdvd: remove a pointless debug check in pkt_submit_bio block: remove handle_bad_sector block: fix and cleanup bio_check_ro bfq: fix use-after-free in bfq_dispatch_request blk-crypto: show crypto capabilities in sysfs block: don't delete queue kobject before its children block: simplify calling convention of elv_unregister_queue() block: remove redundant semicolon block: default BLOCK_LEGACY_AUTOLOAD to y block: update io_ticks when io hang block, bfq: don't move oom_bfqq block, bfq: avoid moving bfqq to it's parent bfqg block, bfq: cleanup bfq_bfqq_to_bfqg() ...
2022-03-03gfs2: Remove return value for gfs2_indirect_initBob Peterson
The return value from function gfs2_indirect_init is never used, so remove it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15gfs2: Initialize gh_error in gfs2_glock_nqAndreas Gruenbacher
The gh_error field if a glock holder is initialized to zero in gfs2_holder_init(). When a locking operation fails, gh_error is set to an error code; when it succeeds, the gh_error value is left unchanged. The field isn't initialized in gfs2_holder_reinit(), which is a problem. Instead of fixing that directly, initialize gh_error in gfs2_glock_nq(). That also obsoletes the assignment in do_flock(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15gfs2: Make use of list_is_firstAndreas Gruenbacher
Use list_is_first() instead of open-coding it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15gfs2: Switch lock order of inode and iopen glockAndreas Gruenbacher
This patch tries to fix the continual ABBA deadlocks we keep having between the iopen and inode glocks. This switches the lock order in gfs2_inode_lookup and gfs2_create_inode so the iopen glock is always locked first. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2022-02-15gfs2: cancel timed-out glock requestsAndreas Gruenbacher
The gfs2 evict code tries to upgrade the iopen glock from SH to EX. If the attempt to upgrade times out, gfs2 needs to tell dlm to cancel the lock request or it can deadlock. We also need to wake up the process waiting for the lock when dlm sends its AST back to gfs2. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2022-02-15gfs2: Expect -EBUSY after canceling dlm locking requestsAndreas Gruenbacher
Due to the asynchronous nature of the dlm api, when we request a pending locking request to be canceled with dlm_unlock(DLM_LKF_CANCEL), the locking request will either complete before it could be canceled, or the cancellation will succeed. In either case, gdlm_ast will be called once and the status will indicate the outcome of the locking request, with -DLM_ECANCEL indicating a canceled request. Inside dlm, when a locking request completes before its cancel request could be processed, gdlm_ast will be called, but the lock will still be considered busy until a DLM_MSG_CANCEL_REPLY message completes the cancel request. During that time, successive dlm_lock() or dlm_unlock() requests for that lock will return -EBUSY. In other words, waiting for the gdlm_ast call before issuing the next locking request is not enough. There is no way of waiting for a cancel request to actually complete, either. We rarely cancel locking requests, but when we do, we don't know when the next locking request for that lock will occur. This means that any dlm_lock() or dlm_unlock() call can potentially return -EBUSY. When that happens, this patch simply repeats the request after a short pause. This workaround could be improved upon by tracking for which dlm locks cancel requests have been issued, but that isn't strictly necessary and it would complicate the code. We haven't seen -EBUSY errors from dlm without cancel requests. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15gfs2: gfs2_setattr_size error path fixAndreas Gruenbacher
When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get rid of any reservations the inode may have. Instead, it should pass in the inode's write count as the second parameter to allow gfs2_rs_delete() to figure out if the inode has any writers left. In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left where we know that there can be no other users of the inode. Replace those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write count check. With that, gfs2_rs_delete() is only called with the inode's actual write count, so get rid of the second parameter. Fixes: a097dc7e24cb ("GFS2: Make rgrp reservations part of the gfs2_inode structure") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15gfs2: assign rgrp glock before compute_bitstructsBob Peterson
Before this patch, function read_rindex_entry called compute_bitstructs before it allocated a glock for the rgrp. But if compute_bitstructs found a problem with the rgrp, it called gfs2_consist_rgrpd, and that called gfs2_dump_glock for rgd->rd_gl which had not yet been assigned. read_rindex_entry compute_bitstructs gfs2_consist_rgrpd gfs2_dump_glock <---------rgd->rd_gl was not set. This patch changes read_rindex_entry so it assigns an rgrp glock before calling compute_bitstructs so gfs2_dump_glock does not reference an unassigned pointer. If an error is discovered, the glock must also be put, so a new goto and label were added. Reported-by: syzbot+c6fd14145e2f62ca0784@syzkaller.appspotmail.com Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-11Merge tag 'gfs2-v5.16-rc3-fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: - Revert debug commit that causes unexpected data corruption - Fix muti-block reservation regression * tag 'gfs2-v5.16-rc3-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix gfs2_release for non-writers regression Revert "gfs2: check context in gfs2_glock_put"
2022-02-11gfs2: Fix gfs2_release for non-writers regressionBob Peterson
When a file is opened for writing, the vfs code (do_dentry_open) calls get_write_access for the inode, thus incrementing the inode's write count. That writer normally then creates a multi-block reservation for the inode (i_res) that can be re-used by other writers, which speeds up writes for applications that stupidly loop on open/write/close. When the writes are all done, the multi-block reservation should be deleted when the file is closed by the last "writer." Commit 0ec9b9ea4f83 broke that concept when it moved the call to gfs2_rs_delete before the check for FMODE_WRITE. Non-writers have no business removing the multi-block reservations of writers. In fact, if someone opens and closes the file for RO while a writer has a multi-block reservation, the RO closer will delete the reservation midway through the write, and this results in: kernel BUG at fs/gfs2/rgrp.c:677! (or thereabouts) which is: BUG_ON(rs->rs_requested); from function gfs2_rs_deltree. This patch moves the check back inside the check for FMODE_WRITE. Fixes: 0ec9b9ea4f83 ("gfs2: Check for active reservation in gfs2_release") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-11Revert "gfs2: check context in gfs2_glock_put"Andreas Gruenbacher
It turns out that the might_sleep() call that commit 660a6126f8c3 adds is triggering occasional data corruption in testing. We're not sure about the root cause yet, but since this commit was added as a debugging aid only, revert it for now. This reverts commit 660a6126f8c3208f6df8d552039cda078a8426d1. Fixes: 660a6126f8c3 ("gfs2: check context in gfs2_glock_put") Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-02block: pass a block_device and opf to bio_allocChristoph Hellwig
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: remove genhd.hChristoph Hellwig
There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-12Merge tag 'driver-core-5.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits) kobject documentation: remove default_attrs information drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb debugfs: lockdown: Allow reading debugfs files that are not world readable driver core: Make bus notifiers in right order in really_probe() driver core: Move driver_sysfs_remove() after driver_sysfs_add() firmware: edd: remove empty default_attrs array firmware: dmi-sysfs: use default_groups in kobj_type qemu_fw_cfg: use default_groups in kobj_type firmware: memmap: use default_groups in kobj_type sh: sq: use default_groups in kobj_type headers/uninline: Uninline single-use function: kobject_has_children() devtmpfs: mount with noexec and nosuid driver core: Simplify async probe test code by using ktime_ms_delta() nilfs2: use default_groups in kobj_type kobject: remove kset from struct kset_uevent_ops callbacks driver core: make kobj_type constant. driver core: platform: document registration-failure requirement vdpa/mlx5: Use auxiliary_device driver data helpers net/mlx5e: Use auxiliary_device driver data helpers soundwire: intel: Use auxiliary_device driver data helpers ...
2022-01-11gfs2: dump inode object for iopen glocksBob Peterson
Before this patch, glock dumps would not dump the gl_object for iopen glocks. This information can help us debug problems related to eviction: when AN iopen glock is blocked we can see the status of its underlying inode and its flags, etc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-28kobject: remove kset from struct kset_uevent_ops callbacksGreg Kroah-Hartman
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-04gfs2: Fix gfs2_instantiate descriptionAndreas Gruenbacher
The description of gfs2_instantiate accidentally lists a glock argument, but the function takes a glock holder. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDEDAndreas Gruenbacher
If the GLF_INSTANTIATE_NEEDED flag isn't set, gfs2_instantiate() is a no-op. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04gfs2: remove redundant set of INSTANTIATE_NEEDEDBob Peterson
Function rgrp_go_inval calls gfs2_rgrp_brelse to invalidate the in-core rgrp structures. After the call it set GLF_INSTANTIATE_NEEDED, which is redundant, since gfs2_rgrp_brelse also sets it. This patch simply removes the redundant set_bit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04gfs2: Fix __gfs2_holder_init function name in kernel-doc commentAndreas Gruenbacher
The function name in the kernel-doc comment wasn't updated when the function was renamed. Fixes: b016d9a84abd ("gfs2: Save ip from gfs2_glock_nq_init") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: gfs2_create_inode reworkAndreas Gruenbacher
When gfs2_lookup_by_inum() calls gfs2_inode_lookup() for an uncached inode, gfs2_inode_lookup() will place a new tentative inode into the inode cache before verifying that there is a valid inode at the given address. This can race with gfs2_create_inode() which doesn't check for duplicates inodes. gfs2_create_inode() will try to assign the new inode to the corresponding inode glock, and glock_set_object() will complain that the glock is still in use by gfs2_inode_lookup's tentative inode. We noticed this bug after adding commit 486408d690e1 ("gfs2: Cancel remote delete work asynchronously") which allowed delete_work_func() to race with gfs2_create_inode(), but the same race exists for open-by-handle. Fix that by switching from insert_inode_hash() to insert_inode_locked4(), which does check for duplicate inodes. We know we've just managed to to allocate the new inode, so an inode tentatively created by gfs2_inode_lookup() will eventually go away and insert_inode_locked4() will always succeed. In addition, don't flush the inode glock work anymore (this can now only make things worse) and clean up glock_{set,clear}_object for the inode glock somewhat. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: gfs2_inode_lookup reworkAndreas Gruenbacher
Rework gfs2_inode_lookup() to only set up the new inode's glocks after verifying that the new inode is valid. There is no need for flushing the inode glock work queue anymore now, so remove that as well. While at it, get rid of the useless wrapper around iget5_locked() and its unnecessary is_bad_inode() check. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: gfs2_inode_lookup cleanupAndreas Gruenbacher
In gfs2_inode_lookup, once the inode has been looked up, we check if the inode generation (no_formal_ino) is the one we're looking for. If it isn't and the inode wasn't in the inode cache, we discard the newly looked up inode. This is unnecessary, complicates the code, and makes future changes to gfs2_inode_lookup harder, so change the code to retain newly looked up inodes instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: Fix remote demote of weak glock holdersAndreas Gruenbacher
When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in response to a remote locking conflict, we don't set the HIF_HOLDER flag. This causes function may_grant to BUG. Fix by setting the missing HIF_HOLDER flag in the mock glock holder. In addition, define the mock glock holder where it is used. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-10gfs2: Prevent endless loops in gfs2_file_buffered_writeAndreas Gruenbacher
Currently, instead of performing a short write, iomap_file_buffered_write will fail when part of its iov iterator cannot be read. In contrast, gfs2_file_buffered_write will loop around if it can read part of the iov iterator, so we can end up in an endless loop. This should be fixed in iomap_file_buffered_write (and also generic_perform_write), but this comes a bit late in the 5.16 development cycle, so work around it in the filesystem by trimming the iov iterator to the known-good size for now. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-08gfs2: Fix "Introduce flag for glock holder auto-demotion"Andreas Gruenbacher
Function demote_incompat_holders iterates over the list of glock holders with list_for_each_entry, and it then sometimes removes the current holder from the list. This will get the loop stuck; we must use list_for_each_entry_safe instead. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-06gfs2: Fix length of holes reported at end-of-fileAndreas Gruenbacher
Fix the length of holes reported at the end of a file: the length is relative to the beginning of the extent, not the seek position which is rounded down to the filesystem block size. This bug went unnoticed for some time, but is now caught by the following assertion in iomap_iter_done(): WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-06gfs2: release iopen glock early in evictBob Peterson
Before this patch, evict would clear the iopen glock's gl_object after releasing the inode glock. In the meantime, another process could reuse the same block and thus glocks for a new inode. It would lock the inode glock (exclusively), and then the iopen glock (shared). The shared locking mode doesn't provide any ordering against the evict, so by the time the iopen glock is reused, evict may not have gotten to setting gl_object to NULL. Fix that by releasing the iopen glock before the inode glock in gfs2_evict_inode. Signed-off-by: Bob Peterson <rpeterso@redhat.com>gl_object Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-05gfs2: Fix atomic bug in gfs2_instantiateAndreas Gruenbacher
Replace test_bit() + set_bit() with test_and_set_bit() where we need an atomic operation. Use clear_and_wake_up_bit() instead of open coding it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-03gfs2: Only dereference i->iov when iter_is_iovec(i)Andreas Gruenbacher
Only dereference i->iov after establishing that i is of type ITER_IOVEC. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>