aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-03-02xfs: remove XFS_DA_OP_INCOMPLETEChristoph Hellwig
Now that we use the on-disk flags field also for the interface to the lower level attr routines we can use the XFS_ATTR_INCOMPLETE definition from the on-disk format directly instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: clean up the attr flag confusionChristoph Hellwig
The ATTR_* flags have a long IRIX history, where they a userspace interface, the on-disk format and an internal interface. We've split out the on-disk interface to the XFS_ATTR_* values, but despite (or because?) of that the flag have still been a mess. Switch the internal interface to pass the on-disk XFS_ATTR_* flags for the namespace and the Linux XATTR_* flags for the actual flags instead. The ATTR_* values that are actually used are move to xfs_fs.h with a new XFS_IOC_* prefix to not conflict with the userspace version that has the same name and must have the same value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: clean up the ATTR_REPLACE checksChristoph Hellwig
Remove superflous braces, elses after return statements and use a goto label to merge common error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: improve xfs_forget_aclChristoph Hellwig
Move the function to xfs_acl.c and provide a proper stub for the !CONFIG_XFS_POSIX_ACL case. Lift the flags check to the caller as it nicely fits in there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: lift cursor copy in/out into xfs_ioc_attr_listChristoph Hellwig
Lift the common code to copy the cursor from and to user space into xfs_ioc_attr_list. Note that this means we copy in twice now as the cursor is in the middle of the conaining structure, but we never touch the memory for the original copy. Doing so keeps the cursor handling isolated in the common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: lift buffer allocation into xfs_ioc_attr_listChristoph Hellwig
Lift the buffer allocation from the two callers into xfs_ioc_attr_list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: lift common checks into xfs_ioc_attr_listChristoph Hellwig
Lift the flags and bufsize checks from both callers into the common code in xfs_ioc_attr_list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: rename xfs_attr_list_int to xfs_attr_listChristoph Hellwig
The version taking the context structure is the main interface to list attributes, so drop the _int postfix. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: move the legacy xfs_attr_list to xfs_ioctl.cChristoph Hellwig
The old xfs_attr_list code is only used by the attrlist by handle ioctl. Move it to xfs_ioctl.c with its user. Also move the attrlist and attrlist_ent structure to xfs_fs.h, as they are exposed user ABIs. They are used through libattr headers with the same name by at least xfsdump. Also document this relation so that it doesn't require a research project to figure out. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: open code ATTR_ENTSIZEChristoph Hellwig
Replace a single use macro containing open-coded variants of standard helpers with direct calls to the standard helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the unused ATTR_ENTRY macroChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: cleanup struct xfs_attr_list_contextChristoph Hellwig
Replace the alist char pointer with a void buffer given that different callers use it in different ways. Use the chance to remove the typedef and reduce the indentation of the struct definition so that it doesn't overflow 80 char lines all over. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: factor out a xfs_attr_match helperChristoph Hellwig
Factor out a helper that compares an on-disk attr vs the name, length and flags specified in struct xfs_da_args. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: replace ATTR_KERNOTIME with XFS_DA_OP_NOTIMEChristoph Hellwig
op_flags with the XFS_DA_OP_* flags is the usual place for in-kernel only flags, so move the notime flag there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove ATTR_ALLOC and XFS_DA_OP_ALLOCVALChristoph Hellwig
Use a NULL args->value as the indicator to lazily allocate a buffer instead, and let the caller always free args->value instead of duplicating the cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove ATTR_KERNOVALChristoph Hellwig
We can just pass down the Linux convention of a zero valuelen to just query for the existance of an attribute to the low-level code instead. The use in the legacy xfs_attr_list code only used by the ioctl interface was already dead code, as the callers check that the flag is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the xfs_inode argument to xfs_attr_get_ilockedChristoph Hellwig
The inode can easily be derived from the args structure. Also don't bother with else statements after early returns. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: pass an initialized xfs_da_args to xfs_attr_getChristoph Hellwig
Instead of converting from one style of arguments to another in xfs_attr_set, pass the structure from higher up in the call chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: pass an initialized xfs_da_args structure to xfs_attr_setChristoph Hellwig
Instead of converting from one style of arguments to another in xfs_attr_set, pass the structure from higher up in the call chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: turn xfs_da_args.value into a void pointerChristoph Hellwig
The xattr values are blobs and should not be typed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the MAXNAMELEN check from xfs_attr_args_initChristoph Hellwig
All the callers already check the length when allocating the in-kernel xattrs buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the name == NULL check from xfs_attr_args_initChristoph Hellwig
All callers provide a valid name pointer, remove the redundant check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: factor out a helper for a single XFS_IOC_ATTRMULTI_BY_HANDLE opChristoph Hellwig
Add a new helper to handle a single attr multi ioctl operation that can be shared between the native and compat ioctl implementation. There is a slight change in behaviour in that we don't break out of the loop when copying in the attribute name fails. The previous behaviour was rather inconsistent here as it continued for any other kind of error, and that we don't clear the flags in the structure returned to userspace, a behavior only introduced as a bug fix in the last merge window. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: use strndup_user in XFS_IOC_ATTRMULTI_BY_HANDLEChristoph Hellwig
Simplify the user copy code by using strndup_user. This means that we now do one memory allocation per operation instead of one per ioctl, but memory allocations are cheap compared to the actual file system operations. Also the error for an invalid path is now EINVAL or EFAULT instead of the previous odd and undocumented ERANGE. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: merge xfs_attrmulti_attr_remove into xfs_attrmulti_attr_setChristoph Hellwig
Merge the ioctl handlers just like the low-level xfs_attr_set function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: merge xfs_attr_remove into xfs_attr_setChristoph Hellwig
The Linux xattr and acl APIs use a single call for set and remove. Modify the high-level XFS API to match that and let xfs_attr_set handle removing attributes as well. With a little bit of reordering this removes a lot of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the ATTR_INCOMPLETE flagChristoph Hellwig
Replace the ATTR_INCOMPLETE flag with a new boolean field in struct xfs_attr_list_context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: reject invalid flags combinations in XFS_IOC_ATTRLIST_BY_HANDLEChristoph Hellwig
While the flags field in the ABI and the on-disk format allows for multiple namespace flags, an attribute can only exist in a single namespace at a time. Hence asking to list attributes that exist in multiple namespaces simultaneously is a logically invalid request and will return no results. Reject this case early with -EINVAL. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: rework collapse range into an atomic operationBrian Foster
The collapse range operation uses a unique transaction and ilock cycle for the hole punch and each extent shift iteration of the overall operation. While the hole punch is safe as a separate operation due to the iolock, cycling the ilock after each extent shift is risky w.r.t. concurrent operations, similar to insert range. To avoid this problem, make collapse range atomic with respect to ilock. Hold the ilock across the entire operation, replace the individual transactions with a single rolling transaction sequence and finish dfops on each iteration to perform pending frees and roll the transaction. Remove the unnecessary quota reservation as collapse range can only ever merge extents (and thus remove extent records and potentially free bmap blocks). The dfops call automatically relogs the inode to keep it moving in the log. This guarantees that nothing else can change the extent mapping of an inode while a collapse range operation is in progress. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: rework insert range into an atomic operationBrian Foster
The insert range operation uses a unique transaction and ilock cycle for the extent split and each extent shift iteration of the overall operation. While this works, it is risks racing with other operations in subtle ways such as COW writeback modifying an extent tree in the middle of a shift operation. To avoid this problem, make insert range atomic with respect to ilock. Hold the ilock across the entire operation, replace the individual transactions with a single rolling transaction sequence and relog the inode to keep it moving in the log. This guarantees that nothing else can change the extent mapping of an inode while an insert range operation is in progress. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: open code insert range extent split helperBrian Foster
The insert range operation currently splits the extent at the target offset in a separate transaction and lock cycle from the one that shifts extents. In preparation for reworking insert range into an atomic operation, lift the code into the caller so it can be easily condensed to a single rolling transaction and lock cycle and eliminate the helper. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: Add missing annotation to xfs_ail_check()Jules Irenge
Sparse reports a warning at xfs_ail_check() warning: context imbalance in xfs_ail_check() - unexpected unlock The root cause is the missing annotation at xfs_ail_check() Add the missing __must_hold(&ailp->ail_lock) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: fix an undefined behaviour in _da3_path_shiftQian Cai
In xfs_da3_path_shift() "blk" can be assigned to state->path.blk[-1] if state->path.active is 1 (which is a valid state) when it tries to add an entry to a single dir leaf block and then to shift forward to see if there's a sibling block that would be a better place to put the new entry. This causes a UBSAN warning given negative array indices are undefined behavior in C. In practice the warning is entirely harmless given that "blk" is never dereferenced in this case, but it is still better to fix up the warning and slightly improve the code. UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_da_btree.c:1989:14 index -1 is out of range for type 'xfs_da_state_blk_t [5]' Call trace: dump_backtrace+0x0/0x2c8 show_stack+0x20/0x2c dump_stack+0xe8/0x150 __ubsan_handle_out_of_bounds+0xe4/0xfc xfs_da3_path_shift+0x860/0x86c [xfs] xfs_da3_node_lookup_int+0x7c8/0x934 [xfs] xfs_dir2_node_addname+0x2c8/0xcd0 [xfs] xfs_dir_createname+0x348/0x38c [xfs] xfs_create+0x6b0/0x8b4 [xfs] xfs_generic_create+0x12c/0x1f8 [xfs] xfs_vn_mknod+0x3c/0x4c [xfs] xfs_vn_create+0x34/0x44 [xfs] do_last+0xd4c/0x10c8 path_openat+0xbc/0x2f4 do_filp_open+0x74/0xf4 do_sys_openat2+0x98/0x180 __arm64_sys_openat+0xf8/0x170 do_el0_svc+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: ratelimit xfs_discard_page messagesChristoph Hellwig
Use printk_ratelimit() to limit the amount of messages printed from xfs_discard_page. Without that a failing device causes a large number of errors that doesn't really help debugging the underling issue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: ratelimit xfs_buf_ioerror_alert messagesChristoph Hellwig
Use printk_ratelimit() to limit the amount of messages printed from xfs_buf_ioerror_alert. Without that a failing device causes a large number of errors that doesn't really help debugging the underling issue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the kuid/kgid conversion wrappersChristoph Hellwig
Remove the XFS wrappers for converting from and to the kuid/kgid types. Mostly this means switching to VFS i_{u,g}id_{read,write} helpers, but in a few spots the calls to the conversion functions is open coded. To match the use of sb->s_user_ns in the helpers and other file systems, sb->s_user_ns is also used in the quota code. The ACL code already does the conversion in a grotty layering violation in the VFS xattr code, so it keeps using init_user_ns for the identity mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the icdinode di_uid/di_gid membersChristoph Hellwig
Use the Linux inode i_uid/i_gid members everywhere and just convert from/to the scalar value when reading or writing the on-disk inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: ensure that the inode uid/gid match values match the icdinode onesChristoph Hellwig
Instead of only synchronizing the uid/gid values in xfs_setup_inode, ensure that they always match to prepare for removing the icdinode fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: improve error message when we can't allocate memory for xfs_bufDarrick J. Wong
If xfs_buf_get_map can't allocate enough memory for the buffer it's trying to create, it'll cough up an error about not being able to allocate "pagesn". That's not particularly helpful (and if we're really out of memory the message is very spammy) so change the message to tell us how many pages were actually requested, and ratelimit it too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-02xfs: add agf freeblocks verify in xfs_agf_verifyZheng Bin
We recently used fuzz(hydra) to test XFS and automatically generate tmp.img(XFS v5 format, but some metadata is wrong) xfs_repair information(just one AG): agf_freeblks 0, counted 3224 in ag 0 agf_longest 536874136, counted 3224 in ag 0 sb_fdblocks 613, counted 3228 Test as follows: mount tmp.img tmpdir cp file1M tmpdir sync In 4.19-stable, sync will stuck, the reason is: xfs_mountfs xfs_check_summary_counts if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) return 0; -->just return, incore sb_fdblocks still be 613 xfs_initialize_perag_data cp file1M tmpdir -->ok(write file to pagecache) sync -->stuck(write pagecache to disk) xfs_map_blocks xfs_iomap_write_allocate while (count_fsb != 0) { nimaps = 0; while (nimaps == 0) { --> endless loop nimaps = 1; xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again xfs_bmapi_write xfs_bmap_alloc xfs_bmap_btalloc xfs_alloc_vextent xfs_alloc_fix_freelist xfs_alloc_space_available -->fail(agf_freeblks is 0) In linux-next, sync not stuck, cause commit c2b3164320b5 ("xfs: use the latest extent at writeback delalloc conversion time") remove the above while, dmesg is as follows: [ 55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0. Users do not know why this page is discard, the better soultion is: 1. Like xfs_repair, make sure sb_fdblocks is equal to counted (xfs_initialize_perag_data did this, who is not called at this mount) 2. Add agf verify, if fail, will tell users to repair This patch use the second soultion. Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Ren Xudong <renxudong1@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: fix iclog release error check race with shutdownBrian Foster
Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with l_icloglock held"), xlog_state_release_iclog() always performed a locked check of the iclog error state before proceeding into the sync state processing code. As of this commit, part of xlog_state_release_iclog() was open-coded into xfs_log_release_iclog() and as a result the locked error state check was lost. The lockless check still exists, but this doesn't account for the possibility of a race with a shutdown being performed by another task causing the iclog state to change while the original task waits on ->l_icloglock. This has reproduced very rarely via generic/475 and manifests as an assert failure in __xlog_state_release_iclog() due to an unexpected iclog state. Restore the locked error state check in xlog_state_release_iclog() to ensure that an iclog state update via shutdown doesn't race with the iclog release state processing code. Fixes: df732b29c807 ("xfs: call xlog_state_release_iclog with l_icloglock held") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-01Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Two more bug fixes (including a regression) for 5.6" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: potential crash on allocation error in ext4_alloc_flex_bg_array() jbd2: fix data races at struct journal_head
2020-02-29ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()Dan Carpenter
If sbi->s_flex_groups_allocated is zero and the first allocation fails then this code will crash. The problem is that "i--" will set "i" to -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1 is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX is more than zero, the condition is true so we call kvfree(new_groups[-1]). The loop will carry on freeing invalid memory until it crashes. Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access") Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-29jbd2: fix data races at struct journal_headQian Cai
journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ|WRITE_ONCE(). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-28Merge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Fix for a race with IOPOLL used with SQPOLL (Xiaoguang) - Only show ->fdinfo if procfs is enabled (Tobias) - Fix for a chain with multiple personalities in the SQEs - Fix for a missing free of personality idr on exit - Removal of the spin-for-work optimization - Fix for next work lookup on request completion - Fix for non-vec read/write result progation in case of links - Fix for a fileset references on switch - Fix for a recvmsg/sendmsg 32-bit compatability mode * tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block: io_uring: fix 32-bit compatability with sendmsg/recvmsg io_uring: define and set show_fdinfo only if procfs is enabled io_uring: drop file set ref put/get on switch io_uring: import_single_range() returns 0/-ERROR io_uring: pick up link work on submit reference drop io-wq: ensure work->task_pid is cleared on init io-wq: remove spin-for-work optimization io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL io_uring: fix personality idr leak io_uring: handle multiple personalities in link chains
2020-02-28Merge tag 'zonefs-5.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: "Two fixes in here: - Revert the initial decision to silently ignore IOCB_NOWAIT for asynchronous direct IOs to sequential zone files. Instead, return an error to the user to signal that the feature is not supported (from Christoph) - A fix to zonefs Kconfig to select FS_IOMAP to avoid build failures if no other file system already selected this option (from Johannes)" * tag 'zonefs-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: select FS_IOMAP zonefs: fix IOCB_NOWAIT handling
2020-02-27io_uring: fix 32-bit compatability with sendmsg/recvmsgJens Axboe
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise the iovec import for these commands will not do the right thing and fail the command with -EINVAL. Found by running the test suite compiled as 32-bit. Cc: stable@vger.kernel.org Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()") Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-27io_uring: define and set show_fdinfo only if procfs is enabledTobias Klauser
Follow the pattern used with other *_show_fdinfo functions and only define and use io_uring_show_fdinfo and its helper functions if CONFIG_PROC_FS is set. Fixes: 87ce955b24c9 ("io_uring: add ->show_fdinfo() for the io_uring file descriptor") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-26io_uring: drop file set ref put/get on switchJens Axboe
Dan reports that he triggered a warning on ring exit doing some testing: percpu ref (io_file_data_ref_zero) <= 0 (0) after switching to atomic WARNING: CPU: 3 PID: 0 at lib/percpu-refcount.c:160 percpu_ref_switch_to_atomic_rcu+0xe8/0xf0 Modules linked in: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-rc3+ #5648 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:percpu_ref_switch_to_atomic_rcu+0xe8/0xf0 Code: e7 ff 55 e8 eb d2 80 3d bd 02 d2 00 00 75 8b 48 8b 55 d8 48 c7 c7 e8 70 e6 81 c6 05 a9 02 d2 00 01 48 8b 75 e8 e8 3a d0 c5 ff <0f> 0b e9 69 ff ff ff 90 55 48 89 fd 53 48 89 f3 48 83 ec 28 48 83 RSP: 0018:ffffc90000110ef8 EFLAGS: 00010292 RAX: 0000000000000045 RBX: 7fffffffffffffff RCX: 0000000000000000 RDX: 0000000000000045 RSI: ffffffff825be7a5 RDI: ffffffff825bc32c RBP: ffff8881b75eac38 R08: 000000042364b941 R09: 0000000000000045 R10: ffffffff825beb40 R11: ffffffff825be78a R12: 0000607e46005aa0 R13: ffff888107dcdd00 R14: 0000000000000000 R15: 0000000000000009 FS: 0000000000000000(0000) GS:ffff8881b9d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f49e6a5ea20 CR3: 00000001b747c004 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> rcu_core+0x1e4/0x4d0 __do_softirq+0xdb/0x2f1 irq_exit+0xa0/0xb0 smp_apic_timer_interrupt+0x60/0x140 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:default_idle+0x23/0x170 Code: ff eb ab cc cc cc cc 0f 1f 44 00 00 41 54 55 53 65 8b 2d 10 96 92 7e 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d 21 d0 51 00 fb f4 <65> 8b 2d f6 95 92 7e 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 e5 95 Turns out that this is due to percpu_ref_switch_to_atomic() only grabbing a reference to the percpu refcount if it's not already in atomic mode. io_uring drops a ref and re-gets it when switching back to percpu mode. We attempt to protect against this with the FFD_F_ATOMIC bit, but that isn't reliable. We don't actually need to juggle these refcounts between atomic and percpu switch, we can just do them when we've switched to atomic mode. This removes the need for FFD_F_ATOMIC, which wasn't reliable. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-26io_uring: import_single_range() returns 0/-ERRORJens Axboe
Unlike the other core import helpers, import_single_range() returns 0 on success, not the length imported. This means that links that depend on the result of non-vec based IORING_OP_{READ,WRITE} that were added for 5.5 get errored when they should not be. Fixes: 3a6820f2bb8a ("io_uring: add non-vectored read/write commands") Signed-off-by: Jens Axboe <axboe@kernel.dk>