aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-05-07cifs: smbd: take an array of reqeusts when sending upper layer dataLong Li
To support compounding, __smb_send_rqst() now sends an array of requests to the transport layer. Change smbd_send() to take an array of requests, and send them in as few packets as possible. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2019-05-07SMB3: Add handling for different FSCTL access flagsSteve French
DesiredAccess field in SMB3 open request needs to be set differently for READ vs. WRITE ioctls (not just ones that request both). Originally noticed by Pavel Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-05-07cifs: Add support for FSCTL passthrough that write data to the serverRonnie Sahlberg
Add support to pass a blob to the server in FSCTL passthrough. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: remove superfluous inode_lock in cifs_{strict_}fsyncJeff Layton
Originally, filemap_write_and_wait took the i_mutex internally, but commit 02c24a82187d pushed the mutex acquisition into the individual fsync routines, leaving it up to the subsystem maintainers to remove it if it wasn't needed. For cifs, I see no reason to take the inode_lock here. All of the operations inside that lock are protected in other ways. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-05-07cifs: Call MID callback before destroying transportLong Li
When transport is being destroyed, it's possible that some processes may hold memory registrations that need to be deregistred. Call them first so nobody is using transport resources, and it can be destroyed. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: smbd: Retry on memory registration failureLong Li
Memory registration failure doesn't mean this I/O has failed, it means the transport is hitting I/O error or needs reconnect. This error is not from the server. Indicate this error to upper layer, and let upper layer decide how to reconnect and proceed with this I/O. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: smbd: Indicate to retry on transport sending failureLong Li
Failure to send a packet doesn't mean it's a permanent failure, it can't be returned to user process. This I/O should be retried or failed based on server packet response and transport health. This logic is handled by the upper layer. Give this decision to upper layer. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: smbd: Return EINTR when interruptedLong Li
When packets are waiting for outbound I/O and interrupted, return the proper error code to user process. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: smbd: Don't destroy transport on RDMA disconnectLong Li
Now upper layer is handling the transport shutdown and reconnect, remove the code that handling transport shutdown on RDMA disconnect. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07smbd: Make upper layer decide when to destroy the transportLong Li
On transport recoonect, upper layer CIFS code destroys the current transport and then recoonect. This code path is not used by SMBD, in that SMBD destroys its transport on RDMA disconnect notification independent of CIFS upper layer behavior. This approach adds some costs to SMBD layer to handle transport shutdown and restart, and to deal with several racing conditions on reconnecting transport. Re-work this code path by introducing a new smbd_destroy. This function is called form upper layer to ask SMBD to destroy the transport. SMBD will no longer need to destroy the transport by itself while worrying about data transfer is in progress. The upper layer guarantees the transport is locked. change log: v2: fix build errors when CONFIG_CIFS_SMB_DIRECT is not configured Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07SMB3: update comment to clarify enumerating snapshotsSteve French
Trivial update to comment suggested by Pavel. Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07CIFS: check CIFS_MOUNT_NO_DFS when trying to reuse existing sbAurelien Aptel
if we mount A then mount A again with nodfs, we shouldn't reuse the superblock. document the purpose of the defines as well. there are most likely more flags that needs to be added to this mask, in fact the logic to find them should be which flag should be *ignored* when trying to reuse an existing sb. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07CIFS: Show locallease in /proc/mounts for cifs shares mounted with ↵Kenneth D'souza
locallease feature. Missing parameter that should be displayed in the mount list Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Kenneth D'souza <kdsouza@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: Fix DFS cache refresher for DFS linksPaulo Alcantara (SUSE)
As per MS-DFSC, when a DFS cache entry is expired and it is a DFS link, then a new DFS referral must be sent to root server in order to refresh the expired entry. This patch ensures that all new DFS referrals for refreshing the cache are sent to DFS root. Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07cifs: don't use __constant_cpu_to_le32()Sergey Senozhatsky
A trivial patch. cpu_to_le32() is capable enough to detect __builtin_constant_p() and to use an appropriate compile time ___constant_swahb32() function. So we can use cpu_to_le32() instead of __constant_cpu_to_le32(). Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07SMB3: Track total time spent on roundtrips for each SMB3 commandSteve French
Also track minimum and maximum time by command in /proc/fs/cifs/Stats Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-07Merge tag 'selinux-pr-20190507' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "We've got a few SELinux patches for the v5.2 merge window, the highlights are below: - Add LSM hooks, and the SELinux implementation, for proper labeling of kernfs. While we are only including the SELinux implementation here, the rest of the LSM folks have given the hooks a thumbs-up. - Update the SELinux mdp (Make Dummy Policy) script to actually work on a modern system. - Disallow userspace to change the LSM credentials via /proc/self/attr when the task's credentials are already overridden. The change was made in procfs because all the LSM folks agreed this was the Right Thing To Do and duplicating it across each LSM was going to be annoying" * tag 'selinux-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: proc: prevent changes to overridden credentials selinux: Check address length before reading address family kernfs: fix xattr name handling in LSM helpers MAINTAINERS: update SELinux file patterns selinux: avoid uninitialized variable warning selinux: remove useless assignments LSM: lsm_hooks.h - fix missing colon in docstring selinux: Make selinux_kernfs_init_security static kernfs: initialize security of newly created nodes selinux: implement the kernfs_init_security hook LSM: add new hook for kernfs node initialization kernfs: use simple_xattrs for security attributes selinux: try security xattr after genfs for kernfs filesystems kernfs: do not alloc iattrs in kernfs_xattr_get kernfs: clean up struct kernfs_iattrs scripts/selinux: fix build selinux: use kernel linux/socket.h for genheaders and mdp scripts/selinux: modernize mdp
2019-05-07Merge tag 'for-5.2/io_uring-20190507' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Set of changes/improvements for io_uring. This contains: - Fix of a shadowed variable (Colin) - Add support for draining commands (me) - Add support for sync_file_range() (me) - Add eventfd support (me) - cpu_online() fix (Shenghui) - Removal of a redundant ->error assignment (Stefan)" * tag 'for-5.2/io_uring-20190507' of git://git.kernel.dk/linux-block: io_uring: use cpu_online() to check p->sq_thread_cpu instead of cpu_possible() io_uring: fix shadowed variable ret return code being not checked req->error only used for iopoll io_uring: add support for eventfd notifications io_uring: add support for IORING_OP_SYNC_FILE_RANGE fs: add sync_file_range() helper io_uring: add support for marking commands as draining
2019-05-07Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Nothing major in this series, just fixes and improvements all over the map. This contains: - Series of fixes for sed-opal (David, Jonas) - Fixes and performance tweaks for BFQ (via Paolo) - Set of fixes for bcache (via Coly) - Set of fixes for md (via Song) - Enabling multi-page for passthrough requests (Ming) - Queue release fix series (Ming) - Device notification improvements (Martin) - Propagate underlying device rotational status in loop (Holger) - Removal of mtip32xx trim support, which has been disabled for years (Christoph) - Improvement and cleanup of nvme command handling (Christoph) - Add block SPDX tags (Christoph) - Cleanup/hardening of bio/bvec iteration (Christoph) - A few NVMe pull requests (Christoph) - Removal of CONFIG_LBDAF (Christoph) - Various little fixes here and there" * tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits) block: fix mismerge in bvec_advance block: don't drain in-progress dispatch in blk_cleanup_queue() blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release blk-mq: always free hctx after request queue is freed blk-mq: split blk_mq_alloc_and_init_hctx into two parts blk-mq: free hw queue's resource in hctx's release handler blk-mq: move cancel of requeue_work into blk_mq_release blk-mq: grab .q_usage_counter when queuing request from plug code path block: fix function name in comment nvmet: protect discovery change log event list iteration nvme: mark nvme_core_init and nvme_core_exit static nvme: move command size checks to the core nvme-fabrics: check more command sizes nvme-pci: check more command sizes nvme-pci: remove an unneeded variable initialization nvme-pci: unquiesce admin queue on shutdown nvme-pci: shutdown on timeout during deletion nvme-pci: fix psdt field for single segment sgls nvme-multipath: don't print ANA group state by default nvme-multipath: split bios with the ns_head bio_set before submitting ...
2019-05-07Merge tag 'char-misc-5.2-rc1-part2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc update part 2 from Greg KH: "Here is the "real" big set of char/misc driver patches for 5.2-rc1 Loads of different driver subsystem stuff in here, all over the places: - thunderbolt driver updates - habanalabs driver updates - nvmem driver updates - extcon driver updates - intel_th driver updates - mei driver updates - coresight driver updates - soundwire driver cleanups and updates - fastrpc driver updates - other minor driver updates - chardev minor fixups Feels like this tree is getting to be a dumping ground of "small driver subsystems" these days. Which is fine with me, if it makes things easier for those subsystem maintainers. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.2-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (255 commits) intel_th: msu: Add current window tracking intel_th: msu: Add a sysfs attribute to trigger window switch intel_th: msu: Correct the block wrap detection intel_th: Add switch triggering support intel_th: gth: Factor out trace start/stop intel_th: msu: Factor out pipeline draining intel_th: msu: Switch over to scatterlist intel_th: msu: Replace open-coded list_{first,last,next}_entry variants intel_th: Only report useful IRQs to subdevices intel_th: msu: Start handling IRQs intel_th: pci: Use MSI interrupt signalling intel_th: Communicate IRQ via resource intel_th: Add "rtit" source device intel_th: Skip subdevices if their MMIO is missing intel_th: Rework resource passing between glue layers and core intel_th: SPDX-ify the documentation intel_th: msu: Fix single mode with IOMMU coresight: funnel: Support static funnel dt-bindings: arm: coresight: Unify funnel DT binding coresight: replicator: Add new device id for static replicator ...
2019-05-07Merge tag 'driver-core-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core/kobject updates from Greg KH: "Here is the "big" set of driver core patches for 5.2-rc1 There are a number of ACPI patches in here as well, as Rafael said they should go through this tree due to the driver core changes they required. They have all been acked by the ACPI developers. There are also a number of small subsystem-specific changes in here, due to some changes to the kobject core code. Those too have all been acked by the various subsystem maintainers. As for content, it's pretty boring outside of the ACPI changes: - spdx cleanups - kobject documentation updates - default attribute groups for kobjects - other minor kobject/driver core fixes All have been in linux-next for a while with no reported issues" * tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits) kobject: clean up the kobject add documentation a bit more kobject: Fix kernel-doc comment first line kobject: Remove docstring reference to kset firmware_loader: Fix a typo ("syfs" -> "sysfs") kobject: fix dereference before null check on kobj Revert "driver core: platform: Fix the usage of platform device name(pdev->name)" init/config: Do not select BUILD_BIN2C for IKCONFIG Provide in-kernel headers to make extending kernel easier kobject: Improve doc clarity kobject_init_and_add() kobject: Improve docs for kobject_add/del driver core: platform: Fix the usage of platform device name(pdev->name) livepatch: Replace klp_ktype_patch's default_attrs with groups cpufreq: schedutil: Replace default_attrs field with groups padata: Replace padata_attr_type default_attrs field with groups irqdesc: Replace irq_kobj_type's default_attrs field with groups net-sysfs: Replace ktype default_attrs field with groups block: Replace all ktype default_attrs with groups samples/kobject: Replace foo_ktype's default_attrs field with groups kobject: Add support for default attribute groups to kobj_type driver core: Postpone DMA tear-down until after devres release for probe failure ...
2019-05-07Merge tag 'Wimplicit-fallthrough-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva: "Mark switch cases where we are expecting to fall through. This is part of the ongoing efforts to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. We are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the several missing break/return bugs, some of them introduced more than 5 years ago. Once this work is finished, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again" * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) memstick: mark expected switch fall-throughs drm/nouveau/nvkm: mark expected switch fall-throughs NFC: st21nfca: Fix fall-through warnings NFC: pn533: mark expected switch fall-throughs block: Mark expected switch fall-throughs ASN.1: mark expected switch fall-through lib/cmdline.c: mark expected switch fall-throughs lib: zstd: Mark expected switch fall-throughs scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs scsi: ppa: mark expected switch fall-through scsi: osst: mark expected switch fall-throughs scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs scsi: lpfc: lpfc_nvme: Mark expected switch fall-through scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs scsi: lpfc: lpfc_els: Mark expected switch fall-throughs scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs scsi: imm: mark expected switch fall-throughs scsi: csiostor: csio_wr: mark expected switch fall-through ...
2019-05-07Merge tag 'pidfd-v5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This patchset makes it possible to retrieve pidfds at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. After a thorough review from Oleg CLONE_PIDFD returns pidfds in the parent_tidptr argument. This means we can give back the associated pid and the pidfd at the same time. Access to process metadata information thus becomes rather trivial. As has been agreed, CLONE_PIDFD creates file descriptors based on anonymous inodes similar to the new mount api. They are made unconditional by this patchset as they are now needed by core kernel code (vfs, pidfd) even more than they already were before (timerfd, signalfd, io_uring, epoll etc.). The core patchset is rather small. The bulky looking changelist is caused by David's very simple changes to Kconfig to make anon inodes unconditional. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". To remove worries about missing metadata access this patchset comes with a sample/test program that illustrates how a combination of CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. Further work based on this patchset has been done by Joel. His work makes pidfds pollable. It finished too late for this merge window. I would prefer to have it sitting in linux-next for a while and send it for inclusion during the 5.3 merge window" * tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: samples: show race-free pidfd metadata access signal: support CLONE_PIDFD with pidfd_send_signal clone: add CLONE_PIDFD Make anon_inodes unconditional
2019-05-07Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linuxLinus Torvalds
Pull stream_open conversion from Kirill Smelkov: - remove unnecessary double nonseekable_open from drivers/char/dtlk.c as noticed by Pavel Machek while reviewing nonseekable_open -> stream_open mass conversion. - the mass conversion patch promised in commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") and is automatically generated by running $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. More details on this in the patch. - finally, change VFS to pass ppos=NULL into .read/.write for files that declare themselves streams. It was suggested by Rasmus Villemoes and makes sure that if ppos starts to be erroneously used in a stream file, such bug won't go unnoticed and will produce an oops instead of creating illusion of position change being taken into account. Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to use stream_open()" that will be hopefully coming via FUSE tree, because fs/fuse/ uses new-style .read_iter/.write_iter, and for these accessors position is still passed as non-pointer kiocb.ki_pos . * tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux: vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files *: convert stream-like files from nonseekable_open -> stream_open dtlk: remove double call to nonseekable_open
2019-05-07Merge tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "Here's a big pile of new stuff for XFS for 5.2. XFS has grown the ability to report metadata health status to userspace after online fsck checks the filesystem. The online metadata checking code is (I really hope) feature complete with the addition of checks for the global fs counters, though it'll remain EXPERIMENTAL for now. There are also fixes for thundering herds of writeback completions and some other deadlocks, fixes for theoretical integer overflow attacks on space accounting, and removal of the long-defunct 'mntpt' option which was deprecated in the mid-2000s and (it turns out) totally broken since 2011 (and nobody complained...). Summary: - Fix some more buffer deadlocks when performing an unmount after a hard shutdown. - Fix some minor space accounting issues. - Fix some use after free problems. - Make the (undocumented) FITRIM behavior consistent with other filesystems. - Embiggen the xfs geometry ioctl's data structure. - Introduce a new AG geometry ioctl. - Introduce a new online health reporting infrastructure and ioctl for userspace to query a filesystem's health status. - Enhance online scrub and repair to update the health reports. - Reduce thundering herd problems when writeback io completes. - Fix some transaction reservation type errors. - Fix integer overflow problems with delayed alloc reservation counters. - Fix some problems where we would exit to userspace without unlocking. - Fix inconsistent behavior when finishing deferred ops fails. - Strengthen scrub to check incore data against ondisk metadata. - Remove long-broken mntpt mount option. - Add an online scrub function for the filesystem summary counters, which should make online metadata scrub more or less feature complete for now. - Various cleanups" * tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (38 commits) xfs: change some error-less functions to void types xfs: add online scrub for superblock counters xfs: don't parse the mtpt mount option xfs: always rejoin held resources during defer roll xfs: add missing error check in xfs_prepare_shift() xfs: scrub should check incore counters against ondisk headers xfs: allow scrubbers to pause background reclaim xfs: rename the speculative block allocation reclaim toggle functions xfs: track delayed allocation reservations across the filesystem xfs: fix broken bhold behavior in xrep_roll_ag_trans xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction xfs: kill the xfs_dqtrx_t typedef xfs: widen inode delalloc block counter to 64-bits xfs: widen quota block counters to 64-bit integers xfs: abort unaligned nowait directio early xfs: assert that we don't enter agfl freeing with a non-permanent transaction xfs: make tr_growdata a permanent transaction xfs: merge adjacent io completions of the same type xfs: remove unused m_data_workqueue xfs: implement per-inode writeback completion queues ...
2019-05-07Merge tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "Nothing particularly exciting here, just adding some callouts for gfs2 and cleaning a few things. Summary: - Add some extra hooks to the iomap buffered write path to enable gfs2 journalled writes - SPDX conversion - Various refactoring" * tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: move iomap_read_inline_data around iomap: Add a page_prepare callback iomap: Fix use-after-free error in page_done callback fs: Turn __generic_write_end into a void function iomap: Clean up __generic_write_end calling iomap: convert to SPDX identifier
2019-05-07Merge tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs updates from Dave Kleikamp: "Several minor jfs fixes" * tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggy: jfs: fix bogus variable self-initialization fs/jfs: Switch to use new generic UUID API jfs: compare old and new mode before setting update_mode flag jfs: remove incorrect comment in jfs_superblock jfs: fix spelling mistake, EACCESS -> EACCES
2019-05-07Merge tag 'for-5.2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This time the majority of changes are cleanups, though there's still a number of changes of user interest. User visible changes: - better read time and write checks to catch errors early and before writing data to disk (to catch potential memory corruption on data that get checksummed) - qgroups + metadata relocation: last speed up patch int the series to address the slowness, there should be no overhead comparing balance with and without qgroups - FIEMAP ioctl does not start a transaction unnecessarily, this can result in a speed up and less blocking due to IO - LOGICAL_INO (v1, v2) does not start transaction unnecessarily, this can speed up the mentioned ioctl and scrub as well - fsync on files with many (but not too many) hardlinks is faster, finer decision if the links should be fsynced individually or completely - send tries harder to find ranges to clone - trim/discard will skip unallocated chunks that haven't been touched since the last mount Fixes: - send flushes delayed allocation before start, otherwise it could miss some changes in case of a very recent rw->ro switch of a subvolume - fix fallocate with qgroups that could lead to space accounting underflow, reported as a warning - trim/discard ioctl honours the requested range - starting send and dedupe on a subvolume at the same time will let only one of them succeed, this is to prevent changes that send could miss due to dedupe; both operations are restartable Core changes: - more tree-checker validations, errors reported by fuzzing tools: - device item - inode item - block group profiles - tracepoints for extent buffer locking - async cow preallocates memory to avoid errors happening too deep in the call chain - metadata reservations for delalloc reworked to better adapt in many-writers/low-space scenarios - improved space flushing logic for intense DIO vs buffered workloads - lots of cleanups - removed unused struct members - redundant argument removal - properties and xattrs - extent buffer locking - selftests - use common file type conversions - many-argument functions reduction" * tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (227 commits) btrfs: Use kvmalloc for allocating compressed path context btrfs: Factor out common extent locking code in submit_compressed_extents btrfs: Set io_tree only once in submit_compressed_extents btrfs: Replace clear_extent_bit with unlock_extent btrfs: Make compress_file_range take only struct async_chunk btrfs: Remove fs_info from struct async_chunk btrfs: Rename async_cow to async_chunk btrfs: Preallocate chunks in cow_file_range_async btrfs: reserve delalloc metadata differently btrfs: track DIO bytes in flight btrfs: merge calls of btrfs_setxattr and btrfs_setxattr_trans in btrfs_set_prop btrfs: delete unused function btrfs_set_prop_trans btrfs: start transaction in xattr_handler_set_prop btrfs: drop local copy of inode i_mode btrfs: drop old_fsflags in btrfs_ioctl_setflags btrfs: modify local copy of btrfs_inode flags btrfs: drop useless inode i_flags copy and restore btrfs: start transaction in btrfs_ioctl_setflags() btrfs: export btrfs_set_prop btrfs: refactor btrfs_set_props to validate externally ...
2019-05-07Merge branch 'stable-fodder' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs stable fodder fixes from Al Viro: - acct_on() fix for deadlock caught by overlayfs folks - autofs RCU use-after-free SNAFU (->d_manage() can be called locklessly, so we need to RCU-delay freeing the objects it looks at) - (hopefully) the end of "do we need freeing this dentry RCU-delayed" whack-a-mole. * 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: autofs: fix use-after-free in lockless ->d_manage() dcache: sort the freeing-without-RCU-delay mess for good. acct_on(): don't mess with freeze protection
2019-05-07Merge branch 'work.icache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs inode freeing updates from Al Viro: "Introduction of separate method for RCU-delayed part of ->destroy_inode() (if any). Pretty much as posted, except that destroy_inode() stashes ->free_inode into the victim (anon-unioned with ->i_fops) before scheduling i_callback() and the last two patches (sockfs conversion and folding struct socket_wq into struct socket) are excluded - that pair should go through netdev once davem reopens his tree" * 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits) orangefs: make use of ->free_inode() shmem: make use of ->free_inode() hugetlb: make use of ->free_inode() overlayfs: make use of ->free_inode() jfs: switch to ->free_inode() fuse: switch to ->free_inode() ext4: make use of ->free_inode() ecryptfs: make use of ->free_inode() ceph: use ->free_inode() btrfs: use ->free_inode() afs: switch to use of ->free_inode() dax: make use of ->free_inode() ntfs: switch to ->free_inode() securityfs: switch to ->free_inode() apparmor: switch to ->free_inode() rpcpipe: switch to ->free_inode() bpf: switch to ->free_inode() mqueue: switch to ->free_inode() ufs: switch to ->free_inode() coda: switch to ->free_inode() ...
2019-05-07Merge tag 'printk-for-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()
2019-05-07io_uring: use cpu_online() to check p->sq_thread_cpu instead of cpu_possible()Shenghui Wang
This issue is found by running liburing/test/io_uring_setup test. When test run, the testcase "attempt to bind to invalid cpu" would not pass with messages like: io_uring_setup(1, 0xbfc2f7c8), \ flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, \ resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, \ sq_thread_cpu: 2 expected -1, got 3 FAIL On my system, there is: CPU(s) possible : 0-3 CPU(s) online : 0-1 CPU(s) offline : 2-3 CPU(s) present : 0-1 The sq_thread_cpu 2 is offline on my system, so the bind should fail. But cpu_possible() will pass the check. We shouldn't be able to bind to an offline cpu. Use cpu_online() to do the check. After the change, the testcase run as expected: EINVAL will be returned for cpu offlined. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for AEAD in simd - Add fuzz testing to testmgr - Add panic_on_fail module parameter to testmgr - Use per-CPU struct instead multiple variables in scompress - Change verify API for akcipher Algorithms: - Convert x86 AEAD algorithms over to simd - Forbid 2-key 3DES in FIPS mode - Add EC-RDSA (GOST 34.10) algorithm Drivers: - Set output IV with ctr-aes in crypto4xx - Set output IV in rockchip - Fix potential length overflow with hashing in sun4i-ss - Fix computation error with ctr in vmx - Add SM4 protected keys support in ccree - Remove long-broken mxc-scc driver - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits) crypto: ccree - use a proper le32 type for le32 val crypto: ccree - remove set but not used variable 'du_size' crypto: ccree - Make cc_sec_disable static crypto: ccree - fix spelling mistake "protedcted" -> "protected" crypto: caam/qi2 - generate hash keys in-place crypto: caam/qi2 - fix DMA mapping of stack memory crypto: caam/qi2 - fix zero-length buffer DMA mapping crypto: stm32/cryp - update to return iv_out crypto: stm32/cryp - remove request mutex protection crypto: stm32/cryp - add weak key check for DES crypto: atmel - remove set but not used variable 'alg_name' crypto: picoxcell - Use dev_get_drvdata() crypto: crypto4xx - get rid of redundant using_sd variable crypto: crypto4xx - use sync skcipher for fallback crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: crypto4xx - fix ctr-aes missing output IV crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o crypto: ccree - handle tee fips error during power management resume crypto: ccree - add function to handle cryptocell tee fips error ...
2019-05-06Merge branch 'core-stacktrace-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stack trace updates from Ingo Molnar: "So Thomas looked at the stacktrace code recently and noticed a few weirdnesses, and we all know how such stories of crummy kernel code meeting German engineering perfection end: a 45-patch series to clean it all up! :-) Here's the changes in Thomas's words: 'Struct stack_trace is a sinkhole for input and output parameters which is largely pointless for most usage sites. In fact if embedded into other data structures it creates indirections and extra storage overhead for no benefit. Looking at all usage sites makes it clear that they just require an interface which is based on a storage array. That array is either on stack, global or embedded into some other data structure. Some of the stack depot usage sites are outright wrong, but fortunately the wrongness just causes more stack being used for nothing and does not have functional impact. Another oddity is the inconsistent termination of the stack trace with ULONG_MAX. It's pointless as the number of entries is what determines the length of the stored trace. In fact quite some call sites remove the ULONG_MAX marker afterwards with or without nasty comments about it. Not all architectures do that and those which do, do it inconsistenly either conditional on nr_entries == 0 or unconditionally. The following series cleans that up by: 1) Removing the ULONG_MAX termination in the architecture code 2) Removing the ULONG_MAX fixups at the call sites 3) Providing plain storage array based interfaces for stacktrace and stackdepot. 4) Cleaning up the mess at the callsites including some related cleanups. 5) Removing the struct stack_trace based interfaces This is not changing the struct stack_trace interfaces at the architecture level, but it removes the exposure to the generic code'" * 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) x86/stacktrace: Use common infrastructure stacktrace: Provide common infrastructure lib/stackdepot: Remove obsolete functions stacktrace: Remove obsolete functions livepatch: Simplify stack trace retrieval tracing: Remove the last struct stack_trace usage tracing: Simplify stack trace retrieval tracing: Make ftrace_trace_userstack() static and conditional tracing: Use percpu stack trace buffer more intelligently tracing: Simplify stacktrace retrieval in histograms lockdep: Simplify stack trace handling lockdep: Remove save argument from check_prev_add() lockdep: Remove unused trace argument from print_circular_bug() drm: Simplify stacktrace handling dm persistent data: Simplify stack trace handling dm bufio: Simplify stack trace retrieval btrfs: ref-verify: Simplify stack trace retrieval dma/debug: Simplify stracktrace retrieval fault-inject: Simplify stacktrace retrieval mm/page_owner: Simplify stack trace handling ...
2019-05-06io_uring: fix shadowed variable ret return code being not checkedColin Ian King
Currently variable ret is declared in a while-loop code block that shadows another variable ret. When an error occurs in the while-loop the error return in ret is not being set in the outer code block and so the error check on ret is always going to be checking on the wrong ret variable resulting in check that is always going to be true and a premature return occurs. Fix this by removing the declaration of the inner while-loop variable ret so that shadowing does not occur. Addresses-Coverity: ("'Constant' variable guards dead code") Fixes: 6b06314c47e1 ("io_uring: add file set registration") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM filesKirill Smelkov
This amends commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") in how position is passed into .read()/.write() handler for stream-like files: Rasmus noticed that we currently pass 0 as position and ignore any position change if that is done by a file implementation. This papers over bugs if ppos is used in files that declare themselves as being stream-like as such bugs will go unnoticed. Even if a file implementation is correctly converted into using stream_open, its read/write later could be changed to use ppos and even though that won't be working correctly, that bug might go unnoticed without someone doing wrong behaviour analysis. It is thus better to pass ppos=NULL into read/write for stream-like files as that don't give any chance for ppos usage bugs because it will oops if ppos is ever used inside .read() or .write(). Note 1: rw_verify_area, new_sync_{read,write} needs to be updated because they are called by vfs_read/vfs_write & friends before file_operations .read/.write . Note 2: if file backend uses new-style .read_iter/.write_iter, position is still passed into there as non-pointer kiocb.ki_pos . Currently stream_open.cocci (semantic patch added by 10dce8af3422) ignores files whose file_operations has *_iter methods. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
2019-05-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: - a couple of ->i_link use-after-free fixes - regression fix for wrong errno on absent device name in mount(2) (this cycle stuff) - ancient UFS braino in large GID handling on Solaris UFS images (bogus cut'n'paste from large UID handling; wrong field checked to decide whether we should look at old (16bit) or new (32bit) field) * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour Abort file_remove_privs() for non-reg. files [fix] get rid of checking for absent device name in vfs_get_tree() apparmorfs: fix use-after-free on symlink traversal securityfs: fix use-after-free on symlink traversal
2019-05-02req->error only used for iopollStefan Bühler
No need to set it in io_poll_add; io_poll_complete doesn't use it to set the result in the CQE. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for eventfd notificationsJens Axboe
Allow registration of an eventfd, which will trigger an event every time a completion event happens for this io_uring instance. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for IORING_OP_SYNC_FILE_RANGEJens Axboe
This behaves just like sync_file_range(2) does. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02fs: add sync_file_range() helperJens Axboe
This just pulls out the ksys_sync_file_range() code to work on a struct file instead of an fd, so we can use it elsewhere. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for marking commands as drainingJens Axboe
There are no ordering constraints between the submission and completion side of io_uring. But sometimes that would be useful to have. One common example is doing an fsync, for instance, and have it ordered with previous writes. Without support for that, the application must do this tracking itself. This adds a general SQE flag, IOSQE_IO_DRAIN. If a command is marked with this flag, then it will not be issued before previous commands have completed, and subsequent commands submitted after the drain will not be issued before the drain is started.. If there are no pending commands, setting this flag will not change the behavior of the issue of the command. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "This is mostly io_uring fixes/tweaks. Most of these were actually done in time for the last -rc, but I wanted to ensure that everything tested out great before including them. The code delta looks larger than it really is, as it's mostly just comment additions/changes. Outside of the comment additions/changes, this is mostly removal of unnecessary barriers. In all, this pull request contains: - Tweak to how we handle errors at submission time. We now post a completion event if the error occurs on behalf of an sqe, instead of returning it through the system call. If the error happens outside of a specific sqe, we return the error through the system call. This makes it nicer to use and makes the "normal" use case behave the same as the offload cases. (me) - Fix for a missing req reference drop from async context (me) - If an sqe is submitted with RWF_NOWAIT, don't punt it to async context. Return -EAGAIN directly, instead of using it as a hint to do async punt. (Stefan) - Fix notes on barriers (Stefan) - Remove unnecessary barriers (Stefan) - Fix potential double free of memory in setup error (Mark) - Further improve sq poll CPU validation (Mark) - Fix page allocation warning and leak on buffer registration error (Mark) - Fix iov_iter_type() for new no-ref flag (Ming) - Fix a case where dio doesn't honor bio no-page-ref (Ming)" * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block: io_uring: avoid page allocation warnings iov_iter: fix iov_iter_type block: fix handling for BIO_NO_PAGE_REF io_uring: drop req submit reference always in async punt io_uring: free allocated io_memory once io_uring: fix SQPOLL cpu validation io_uring: have submission side sqe errors post a cqe io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP io_uring: remove unnecessary barrier after incrementing dropped counter io_uring: remove unnecessary barrier before reading SQ tail io_uring: remove unnecessary barrier after updating SQ head io_uring: remove unnecessary barrier before reading cq head io_uring: remove unnecessary barrier before wq_has_sleeper io_uring: fix notes on barriers io_uring: fix handling SQEs requesting NOWAIT
2019-05-02btrfs: Use kvmalloc for allocating compressed path contextNikolay Borisov
Recent refactoring of cow_file_range_async means it's now possible to request a rather large physically contiguous memory via kmalloc. The size is dependent on the number of 512k chunks that the compressed range consists of. David reported multiple OOM messages on such large allocations. Fix it by switching to using kvmalloc. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Factor out common extent locking code in submit_compressed_extentsNikolay Borisov
Irrespective of whether the compress code fell back to uncompressed or a compressed extent has to be submitted, the extent range is always locked. So factor out the common lock_extent call at the beginning of the loop. No functional changes just removes one duplicate lock_extent call. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Set io_tree only once in submit_compressed_extentsNikolay Borisov
The inode never changes so it's sufficient to dereference it and get the iotree only once, before the execution of the main loop. No functional changes, only the size of the function is decreased: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44) Function old new delta submit_compressed_extents 1240 1196 -44 Total: Before=88476, After=88432, chg -0.05% Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Replace clear_extent_bit with unlock_extentNikolay Borisov
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Make compress_file_range take only struct async_chunkNikolay Borisov
All context this function needs is held within struct async_chunk. Currently we not only pass the struct but also every individual member. This is redundant, simplify it by only passing struct async_chunk and leaving it to compress_file_range to extract the values it requires. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Remove fs_info from struct async_chunkNikolay Borisov
The associated btrfs_work already contains a reference to the fs_info so use that instead of passing it via async_chunk. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Rename async_cow to async_chunkNikolay Borisov
Now that we have an explicit async_chunk struct rename references to variables of this type to async_chunk. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>