aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-06-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge still more updates from Andrew Morton: "Various trees. Mainly those parts of MM whose linux-next dependents are now merged. I'm still sitting on ~160 patches which await merges from -next. Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug, panic, lib, sysctl, mm/gup, mm/pagemap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (52 commits) doc: cgroup: update note about conditions when oom killer is invoked module: move the set_fs hack for flush_icache_range to m68k nommu: use flush_icache_user_range in brk and mmap binfmt_flat: use flush_icache_user_range exec: use flush_icache_user_range in read_code exec: only build read_code when needed m68k: implement flush_icache_user_range arm: rename flush_cache_user_range to flush_icache_user_range xtensa: implement flush_icache_user_range sh: implement flush_icache_user_range asm-generic: add a flush_icache_user_range stub mm: rename flush_icache_user_range to flush_icache_user_page arm,sparc,unicore32: remove flush_icache_user_range riscv: use asm-generic/cacheflush.h powerpc: use asm-generic/cacheflush.h openrisc: use asm-generic/cacheflush.h m68knommu: use asm-generic/cacheflush.h microblaze: use asm-generic/cacheflush.h ia64: use asm-generic/cacheflush.h hexagon: use asm-generic/cacheflush.h ...
2020-06-08binfmt_flat: use flush_icache_user_rangeChristoph Hellwig
load_flat_file works on user addresses. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200515143646.3857579-28-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08exec: use flush_icache_user_range in read_codeChristoph Hellwig
read_code operates on user addresses. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200515143646.3857579-27-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08exec: only build read_code when neededChristoph Hellwig
Only build read_code when binary formats that use it are built into the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200515143646.3857579-26-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliasesGuilherme G. Piccoli
After a recent change introduced by Vlastimil's series [0], kernel is able now to handle sysctl parameters on kernel command line; also, the series introduced a simple infrastructure to convert legacy boot parameters (that duplicate sysctls) into sysctl aliases. This patch converts the watchdog parameters softlockup_panic and {hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure. It fixes the documentation too, since the alias only accepts values 0 or 1, not the full range of integers. We also took the opportunity here to improve the documentation of the previously converted hung_task_panic (see the patch series [0]) and put the alias table in alphabetical order. [0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08kernel/hung_task convert hung_task_panic boot parameter to sysctlVlastimil Babka
We can now handle sysctl parameters on kernel command line and have infrastructure to convert legacy command line options that duplicate sysctl to become a sysctl alias. This patch converts the hung_task_panic parameter. Note that the sysctl handler is more strict and allows only 0 and 1, while the legacy parameter allowed any non-zero value. But there is little reason anyone would not be using 1. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08kernel/sysctl: support handling command line aliasesVlastimil Babka
We can now handle sysctl parameters on kernel command line, but historically some parameters introduced their own command line equivalent, which we don't want to remove for compatibility reasons. We can, however, convert them to the generic infrastructure with a table translating the legacy command line parameters to their sysctl names, and removing the one-off param handlers. This patch adds the support and makes the first conversion to demonstrate it, on the (deprecated) numa_zonelist_order parameter. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-3-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08kernel/sysctl: support setting sysctl parameters from kernel command lineVlastimil Babka
Patch series "support setting sysctl parameters from kernel command line", v3. This series adds support for something that seems like many people always wanted but nobody added it yet, so here's the ability to set sysctl parameters via kernel command line options in the form of sysctl.vm.something=1 The important part is Patch 1. The second, not so important part is an attempt to clean up legacy one-off parameters that do the same thing as a sysctl. I don't want to remove them completely for compatibility reasons, but with generic sysctl support the idea is to remove the one-off param handlers and treat the parameters as aliases for the sysctl variants. I have identified several parameters that mention sysctl counterparts in Documentation/admin-guide/kernel-parameters.txt but there might be more. The conversion also has varying level of success: - numa_zonelist_order is converted in Patch 2 together with adding the necessary infrastructure. It's easy as it doesn't really do anything but warn on deprecated value these days. - hung_task_panic is converted in Patch 3, but there's a downside that now it only accepts 0 and 1, while previously it was any integer value - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic, so there's no straighforward conversion possible - traceoff_on_warning is a flag without value and it would be required to handle that somehow in the conversion infractructure, which seems pointless for a single flag This patch (of 5): A recently proposed patch to add vm_swappiness command line parameter in addition to existing sysctl [1] made me wonder why we don't have a general support for passing sysctl parameters via command line. Googling found only somebody else wondering the same [2], but I haven't found any prior discussion with reasons why not to do this. Settings the vm_swappiness issue aside (the underlying issue might be solved in a different way), quick search of kernel-parameters.txt shows there are already some that exist as both sysctl and kernel parameter - hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism would remove the need to add more of those one-offs and might be handy in situations where configuration by e.g. /etc/sysctl.d/ is impractical. Hence, this patch adds a new parse_args() pass that looks for parameters prefixed by 'sysctl.' and tries to interpret them as writes to the corresponding sys/ files using an temporary in-kernel procfs mount. This mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically registered sysctl tables, even though we don't handle modular sysctls. Errors due to e.g. invalid parameter name or value are reported in the kernel log. The processing is hooked right before the init process is loaded, as some handlers might be more complicated than simple setters and might need some subsystems to be initialized. At the moment the init process can be started and eventually execute a process writing to /proc/sys/ then it should be also fine to do that from the kernel. Sysctls registered later on module load time are not set by this mechanism - it's expected that in such scenarios, setting sysctl values from userspace is practical enough. [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/ [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCKLinus Torvalds
posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact the permission logic internally must not check that bit (it's only for upper layers to decide whether they can block to do IO to look up the acl information or not). But the way the code was written, it _looked_ like it cared, since the function explicitly did not mask that bit off. But it has exactly two callers: one for when that bit is set, which first clears the bit before calling posix_acl_permission(), and the other call site when that bit was clear. So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be used for the actual permission test, and that currently is pointlessly cleared by the callers when the function itself should just not care. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08vfs: do not do group lookup when not necessaryLinus Torvalds
Rasmus Villemoes points out that the 'in_group_p()' tests can be a noticeable expense, and often completely unnecessary. A common situation is that the 'group' bits are the same as the 'other' bits wrt the permissions we want to test. So rewrite 'acl_permission_check()' to not bother checking for group ownership when the permission check doesn't care. For example, if we're asking for read permissions, and both 'group' and 'other' allow reading, there's really no reason to check if we're part of the group or not: either way, we'll allow it. Rasmus says: "On a bog-standard Ubuntu 20.04 install, a workload consisting of compiling lots of userspace programs (i.e., calling lots of short-lived programs that all need to get their shared libs mapped in, and the compilers poking around looking for system headers - lots of /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around 0.1% according to perf top. System-installed files are almost always 0755 (directories and binaries) or 0644, so in most cases, we can avoid the binary search and the cost of pulling the cred->groups array and in_group_p() .text into the cpu cache" Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-07Merge tag 'apparmor-pr-2020-06-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor updates from John Johansen: "Features: - Replace zero-length array with flexible-array - add a valid state flags check - add consistency check between state and dfa diff encode flags - add apparmor subdir to proc attr interface - fail unpack if profile mode is unknown - add outofband transition and use it in xattr match - ensure that dfa state tables have entries Cleanups: - Use true and false for bool variable - Remove semicolon - Clean code by removing redundant instructions - Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint() - remove duplicate check of xattrs on profile attachment - remove useless aafs_create_symlink Bug fixes: - Fix memory leak of profile proxy - fix introspection of of task mode for unconfined tasks - fix nnp subset test for unconfined - check/put label on apparmor_sk_clone_security()" * tag 'apparmor-pr-2020-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: Fix memory leak of profile proxy apparmor: fix introspection of of task mode for unconfined tasks apparmor: check/put label on apparmor_sk_clone_security() apparmor: Use true and false for bool variable security/apparmor/label.c: Clean code by removing redundant instructions apparmor: Replace zero-length array with flexible-array apparmor: ensure that dfa state tables have entries apparmor: remove duplicate check of xattrs on profile attachment. apparmor: add outofband transition and use it in xattr match apparmor: fail unpack if profile mode is unknown apparmor: fix nnp subset test for unconfined apparmor: remove useless aafs_create_symlink apparmor: add proc subdir to attrs apparmor: add consistency check between state and dfa diff encode flags apparmor: add a valid state flags check AppArmor: Remove semicolon apparmor: Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint()
2020-06-07Merge tag 'driver-core-5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core patches for 5.8-rc1. Not all that huge this release, just a number of small fixes and updates: - software node fixes - kobject now sends KOBJ_REMOVE when it is removed from sysfs, not when it is removed from memory (which could come much later) - device link additions and fixes based on testing on more devices - firmware core cleanups - other minor changes, full details in the shortlog All have been in linux-next for a while with no reported issues" * tag 'driver-core-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (23 commits) driver core: Update device link status correctly for SYNC_STATE_ONLY links firmware_loader: change enum fw_opt to u32 software node: implement software_node_unregister() kobject: send KOBJ_REMOVE uevent when the object is removed from sysfs driver core: Remove unnecessary is_fwnode_dev variable in device_add() drivers property: When no children in primary, try secondary driver core: platform: Fix spelling errors in platform.c driver core: Remove check in driver_deferred_probe_force_trigger() of: platform: Batch fwnode parsing when adding all top level devices driver core: fw_devlink: Add support for batching fwnode parsing driver core: Look for waiting consumers only for a fwnode's primary device driver core: Move code to the right part of the file Revert "Revert "driver core: Set fw_devlink to "permissive" behavior by default"" drivers: base: Fix NULL pointer exception in __platform_driver_probe() if a driver developer is foolish firmware_loader: move fw_fallback_config to a private kernel symbol namespace driver core: Add missing '\n' in log messages driver/base/soc: Use kobj_to_dev() API Add documentation on meaning of -EPROBE_DEFER driver core: platform: remove redundant assignment to variable ret debugfs: Use the correct style for SPDX License Identifier ...
2020-06-06Merge tag 'sh-for-5.8' of git://git.libc.org/linux-shLinus Torvalds
Pull arch/sh updates from Rich Felker: "Fix for arch/sh build regression with newer binutils, removal of SH5, fixes for module exports, and misc cleanup" * tag 'sh-for-5.8' of git://git.libc.org/linux-sh: sh: remove sh5 support sh: add missing EXPORT_SYMBOL() for __delay sh: Convert ins[bwl]/outs[bwl] macros to inline functions sh: Convert iounmap() macros to inline functions sh: Add missing DECLARE_EXPORT() for __ashiftrt_r4_xx sh: configs: Cleanup old Kconfig IO scheduler options arch/sh: vmlinux.scr sh: Replace CONFIG_MTD_M25P80 with CONFIG_MTD_SPI_NOR in sh7757lcr_defconfig sh: sh4a: Bring back tmu3_device early device
2020-06-06hpfs: fix warning due to superfluous semicolonZou Wei
Fixes coccicheck warning: fs/hpfs/buffer.c:56:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-05Merge tag 'for-linus-5.8-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: - John Hubbard's conversion from get_user_pages() to pin_user_pages() - Colin Ian King's removal of an unneeded variable initialization * tag 'for-linus-5.8-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: convert get_user_pages() --> pin_user_pages() orangefs: remove redundant assignment to variable ret
2020-06-05Merge tag 'dlm-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a couple minor cleanups, and dropping the interruptible from a wait_event that waits for an event from the userspace cluster management" * tag 'dlm-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: remove BUG() before panic() dlm: Switch to using wait_event() fs:dlm:remove unneeded semicolon in rcom.c dlm: user: Replace zero-length array with flexible-array member dlm: dlm_internal: Replace zero-length array with flexible-array member
2020-06-05Merge tag '5.8-rc-smb3-fixes-part-1' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: "22 changesets, 2 for stable. Includes big performance improvement for large i/o when using multichannel, also includes DFS fixes" * tag '5.8-rc-smb3-fixes-part-1' of git://git.samba.org/sfrench/cifs-2.6: (22 commits) cifs: update internal module version number cifs: multichannel: try to rebind when reconnecting a channel cifs: multichannel: use pointer for binding channel smb3: remove static checker warning cifs: multichannel: move channel selection above transport layer cifs: multichannel: always zero struct cifs_io_parms cifs: dump Security Type info in DebugData smb3: fix incorrect number of credits when ioctl MaxOutputResponse > 64K smb3: default to minimum of two channels when multichannel specified cifs: multichannel: move channel selection in function cifs: fix minor typos in comments and log messages smb3: minor update to compression header definitions cifs: minor fix to two debug messages cifs: Standardize logging output smb3: Add new parm "nodelete" cifs: move some variables off the stack in smb2_ioctl_query_info cifs: reduce stack use in smb2_compound_op cifs: get rid of unused parameter in reconn_setup_dfs_targets() cifs: handle hostnames that resolve to same ip in failover cifs: set up next DFS target before generic_ip_connect() ...
2020-06-05Merge tag 'afs-next-20200604' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "There's some core VFS changes which affect a couple of filesystems: - Make the inode hash table RCU safe and providing some RCU-safe accessor functions. The search can then be done without taking the inode_hash_lock. Care must be taken because the object may be being deleted and no wait is made. - Allow iunique() to avoid taking the inode_hash_lock. - Allow AFS's callback processing to avoid taking the inode_hash_lock when using the inode table to find an inode to notify. - Improve Ext4's time updating. Konstantin Khlebnikov said "For now, I've plugged this issue with try-lock in ext4 lazy time update. This solution is much better." Then there's a set of changes to make a number of improvements to the AFS driver: - Improve callback (ie. third party change notification) processing by: (a) Relying more on the fact we're doing this under RCU and by using fewer locks. This makes use of the RCU-based inode searching outlined above. (b) Moving to keeping volumes in a tree indexed by volume ID rather than a flat list. (c) Making the server and volume records logically part of the cell. This means that a server record now points directly at the cell and the tree of volumes is there. This removes an N:M mapping table, simplifying things. - Improve keeping NAT or firewall channels open for the server callbacks to reach the client by actively polling the fileserver on a timed basis, instead of only doing it when we have an operation to process. - Improving detection of delayed or lost callbacks by including the parent directory in the list of file IDs to be queried when doing a bulk status fetch from lookup. We can then check to see if our copy of the directory has changed under us without us getting notified. - Determine aliasing of cells (such as a cell that is pointed to be a DNS alias). This allows us to avoid having ambiguity due to apparently different cells using the same volume and file servers. - Improve the fileserver rotation to do more probing when it detects that all of the addresses to a server are listed as non-responsive. It's possible that an address that previously stopped responding has become responsive again. Beyond that, lay some foundations for making some calls asynchronous: - Turn the fileserver cursor struct into a general operation struct and hang the parameters off of that rather than keeping them in local variables and hang results off of that rather than the call struct. - Implement some general operation handling code and simplify the callers of operations that affect a volume or a volume component (such as a file). Most of the operation is now done by core code. - Operations are supplied with a table of operations to issue different variants of RPCs and to manage the completion, where all the required data is held in the operation object, thereby allowing these to be called from a workqueue. - Put the standard "if (begin), while(select), call op, end" sequence into a canned function that just emulates the current behaviour for now. There are also some fixes interspersed: - Don't let the EACCES from ICMP6 mapping reach the user as such, since it's confusing as to whether it's a filesystem error. Convert it to EHOSTUNREACH. - Don't use the epoch value acquired through probing a server. If we have two servers with the same UUID but in different cells, it's hard to draw conclusions from them having different epoch values. - Don't interpret the argument to the CB.ProbeUuid RPC as a fileserver UUID and look up a fileserver from it. - Deal with servers in different cells having the same UUIDs. In the event that a CB.InitCallBackState3 RPC is received, we have to break the callback promises for every server record matching that UUID. - Don't let afs_statfs return values that go below 0. - Don't use running fileserver probe state to make server selection and address selection decisions on. Only make decisions on final state as the running state is cleared at the start of probing" Acked-by: Al Viro <viro@zeniv.linux.org.uk> (fs/inode.c part) * tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (27 commits) afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly afs: Show more a bit more server state in /proc/net/afs/servers afs: Don't use probe running state to make decisions outside probe code afs: Fix afs_statfs() to not let the values go below zero afs: Fix the by-UUID server tree to allow servers with the same UUID afs: Reorganise volume and server trees to be rooted on the cell afs: Add a tracepoint to track the lifetime of the afs_volume struct afs: Detect cell aliases 3 - YFS Cells with a canonical cell name op afs: Detect cell aliases 2 - Cells with no root volumes afs: Detect cell aliases 1 - Cells with root volumes afs: Implement client support for the YFSVL.GetCellName RPC op afs: Retain more of the VLDB record for alias detection afs: Fix handling of CB.ProbeUuid cache manager op afs: Don't get epoch from a server because it may be ambiguous afs: Build an abstraction around an "operation" concept afs: Rename struct afs_fs_cursor to afs_operation afs: Remove the error argument from afs_protocol_error() afs: Set error flag rather than return error from file status decode afs: Make callback processing more efficient. afs: Show more information in /proc/net/afs/servers ...
2020-06-05Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of bug fixes and cleanups for ext4, including: - Fix performance problems found in dioread_nolock now that it is the default, caused by transaction leaks. - Clean up fiemap handling in ext4 - Clean up and refactor multiple block allocator (mballoc) code - Fix a problem with mballoc with a smaller file systems running out of blocks because they couldn't properly use blocks that had been reserved by inode preallocation. - Fixed a race in ext4_sync_parent() versus rename() - Simplify the error handling in the extent manipulation code - Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers. - Avoid passing an error pointer to brelse in ext4_xattr_set() - Fix race which could result to freeing an inode on the dirty last in data=journal mode. - Fix refcount handling if ext4_iget() fails - Fix a crash in generic/019 caused by a corrupted extent node" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: avoid unnecessary transaction starts during writeback ext4: don't block for O_DIRECT if IOCB_NOWAIT is set ext4: remove the access_ok() check in ext4_ioctl_get_es_cache fs: remove the access_ok() check in ioctl_fiemap fs: handle FIEMAP_FLAG_SYNC in fiemap_prep fs: move fiemap range validation into the file systems instances iomap: fix the iomap_fiemap prototype fs: move the fiemap definitions out of fs.h fs: mark __generic_block_fiemap static ext4: remove the call to fiemap_check_flags in ext4_fiemap ext4: split _ext4_fiemap ext4: fix fiemap size checks for bitmap files ext4: fix EXT4_MAX_LOGICAL_BLOCK macro add comment for ext4_dir_entry_2 file_type member jbd2: avoid leaking transaction credits when unreserving handle ext4: drop ext4_journal_free_reserved() ext4: mballoc: use lock for checking free blocks while retrying ext4: mballoc: refactor ext4_mb_good_group() ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling ext4: mballoc: refactor ext4_mb_discard_preallocations() ...
2020-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A more active cycle than most of the recent past, with a few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and flowing through RDMA due to it also introducing a new ULP. The removal of FMR has been a recurring discussion theme for a long time. And the usual smattering of features and bug fixes. Summary: - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits) RDMA/cm: Spurious WARNING triggered in cm_destroy_id() RDMA/mlx5: Return ECE DC support RDMA/mlx5: Don't rely on FW to set zeros in ECE response RDMA/mlx5: Return an error if copy_to_user fails IB/hfi1: Use free_netdev() in hfi1_netdev_free() RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr() RDMA/core: Move and rename trace_cm_id_create() IB/hfi1: Fix hfi1_netdev_rx_init() error handling RDMA: Remove 'max_map_per_fmr' RDMA: Remove 'max_fmr' RDMA/core: Remove FMR device ops RDMA/rdmavt: Remove FMR memory registration RDMA/mthca: Remove FMR support for memory registration RDMA/mlx4: Remove FMR support for memory registration RDMA/i40iw: Remove FMR leftovers RDMA/bnxt_re: Remove FMR leftovers RDMA/mlx5: Remove FMR leftovers RDMA/core: Remove FMR pool API RDMA/rds: Remove FMR support for memory registration RDMA/srp: Remove support for FMR memory registration ...
2020-06-05Merge tag 'core_core_updates_for_5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull READ_IMPLIES_EXEC changes from Borislav Petkov: "Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK now that toolchains long support PT_GNU_STACK marking and there's no need anymore to force modern programs into having all its user mappings executable instead of only the stack and the PROT_EXEC ones. Disable that automatic READ_IMPLIES_EXEC forcing on x86-64 and arm64. Add tables documenting how READ_IMPLIES_EXEC is handled on x86-64, arm and arm64. By Kees Cook" * tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces arm32/64/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK arm32/64/elf: Add tables to document READ_IMPLIES_EXEC x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK x86/elf: Add table to document READ_IMPLIES_EXEC
2020-06-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - More MM work. 100ish more to go. Mike Rapoport's "mm: remove __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue - Various other little subsystems * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) lib/ubsan.c: fix gcc-10 warnings tools/testing/selftests/vm: remove duplicate headers selftests: vm: pkeys: fix multilib builds for x86 selftests: vm: pkeys: use the correct page size on powerpc selftests/vm/pkeys: override access right definitions on powerpc selftests/vm/pkeys: test correct behaviour of pkey-0 selftests/vm/pkeys: introduce a sub-page allocator selftests/vm/pkeys: detect write violation on a mapped access-denied-key page selftests/vm/pkeys: associate key on a mapped page and detect write violation selftests/vm/pkeys: associate key on a mapped page and detect access violation selftests/vm/pkeys: improve checks to determine pkey support selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: fix number of reserved powerpc pkeys selftests/vm/pkeys: introduce powerpc support selftests/vm/pkeys: introduce generic pkey abstractions selftests: vm: pkeys: use the correct huge page size selftests/vm/pkeys: fix alloc_random_pkey() to make it really random selftests/vm/pkeys: fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: fix pkey_disable_clear() selftests: vm: pkeys: add helpers for pkey bits ...
2020-06-04exec: open code copy_string_kernelChristoph Hellwig
Currently copy_string_kernel is just a wrapper around copy_strings that simplifies the calling conventions and uses set_fs to allow passing a kernel pointer. But due to the fact the we only need to handle a single kernel argument pointer, the logic can be sigificantly simplified while getting rid of the set_fs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200501104105.2621149-3-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04exec: simplify the copy_strings_kernel calling conventionChristoph Hellwig
copy_strings_kernel is always used with a single argument, adjust the calling convention to that. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200501104105.2621149-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04fs/seq_file.c: seq_read: Update pr_info_ratelimitedJoe Perches
Use a more common logging style. Add and use pr_fmt, coalesce the format string, align arguments, use better grammar. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vasily Averin <vvs@virtuozzo.com> Link: http://lkml.kernel.org/r/96ff603230ca1bd60034c36519be3930c3a3a226.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04fat: improve the readahead for FAT entriesOGAWA Hirofumi
Current readahead for FAT entries is very simple but is having some flaws, so it is not working well for some environments. This patch improves the readahead more or less. The key points of modification are, - make the readahead size tunable by using bdi->ra_pages - care the bdi->io_pages to avoid the small size I/O request - update readahead window before fully exhausting With this patch, on slow USB connected 2TB hdd: [before] 383.18sec [after] 51.03sec Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: hyeongseok.kim <hyeongseok.kim@lge.com> Reviewed-by: hyeongseok.kim <hyeongseok.kim@lge.com> Link: http://lkml.kernel.org/r/87d08e1dlh.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04fat: don't allow to mount if the FAT length == 0OGAWA Hirofumi
If FAT length == 0, the image doesn't have any data. And it can be the cause of overlapping the root dir and FAT entries. Also Windows treats it as invalid format. Reported-by: syzbot+6f1624f937d9d6911e2d@syzkaller.appspotmail.com Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/87r1wz8mrd.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04fs/binfmt_elf: remove redundant elf_map ifndefAnthony Iliopoulos
The ifndef was added a long time ago to support archs that would define their own mapping function. The last user was the metag arch which was removed from the tree, and as such there are no users left. Let's kill it. Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200402161543.4119-1-ailiop@suse.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04proc: rename "catch" function argumentAlexey Dobriyan
"catch" is reserved keyword in C++, rename it to something both gcc and g++ accept. Rename "ign" for symmetry. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200331210905.GA31680@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04Merge branch 'exec-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "Last cycle for the Nth time I ran into bugs and quality of implementation issues related to exec that could not be easily be fixed because of the way exec is implemented. So I have been digging into exec and cleanup up what I can. I don't think I have exec sorted out enough to fix the issues I started with but I have made some headway this cycle with 4 sets of changes. - promised cleanups after introducing exec_update_mutex - trivial cleanups for exec - control flow simplifications - remove the recomputation of bprm->cred The net result is code that is a bit easier to understand and work with and a decrease in the number of lines of code (if you don't count the added tests)" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits) exec: Compute file based creds only once exec: Add a per bprm->file version of per_clear binfmt_elf_fdpic: fix execfd build regression selftests/exec: Add binfmt_script regression test exec: Remove recursion from search_binary_handler exec: Generic execfd support exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC exec: Move the call of prepare_binprm into search_binary_handler exec: Allow load_misc_binary to call prepare_binprm unconditionally exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds exec: Teach prepare_exec_creds how exec treats uids & gids exec: Set the point of no return sooner exec: Move handling of the point of no return to the top level exec: Run sync_mm_rss before taking exec_update_mutex exec: Fix spelling of search_binary_handler in a comment exec: Move the comment from above de_thread to above unshare_sighand exec: Rename flush_old_exec begin_new_exec exec: Move most of setup_new_exec into flush_old_exec exec: In setup_new_exec cache current in the local variable me ...
2020-06-04Merge branch 'proc-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull proc updates from Eric Biederman: "This has four sets of changes: - modernize proc to support multiple private instances - ensure we see the exit of each process tid exactly - remove has_group_leader_pid - use pids not tasks in posix-cpu-timers lookup Alexey updated proc so each mount of proc uses a new superblock. This allows people to actually use mount options with proc with no fear of messing up another mount of proc. Given the kernel's internal mounts of proc for things like uml this was a real problem, and resulted in Android's hidepid mount options being ignored and introducing security issues. The rest of the changes are small cleanups and fixes that came out of my work to allow this change to proc. In essence it is swapping the pids in de_thread during exec which removes a special case the code had to handle. Then updating the code to stop handling that special case" * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: proc_pid_ns takes super_block as an argument remove the no longer needed pid_alive() check in __task_pid_nr_ns() posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type posix-cpu-timers: Extend rcu_read_lock removing task_struct references signal: Remove has_group_leader_pid exec: Remove BUG_ON(has_group_leader_pid) posix-cpu-timer: Unify the now redundant code in lookup_task posix-cpu-timer: Tidy up group_leader logic in lookup_task proc: Ensure we see the exit of each process tid exactly once rculist: Add hlists_swap_heads_rcu proc: Use PIDTYPE_TGID in next_tgid Use proc_pid_ns() to get pid_namespace from the proc superblock proc: use named enums for better readability proc: use human-readable values for hidepid docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior proc: add option to mount only a pids subset proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option proc: allow to mount many instances of proc in one pid namespace proc: rename struct proc_fs_info to proc_fs_opts
2020-06-04Merge tag 'for_v5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 and reiserfs cleanups from Jan Kara: "Two small cleanups for ext2 and one for reiserfs" * tag 'for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: Replace kmalloc with kcalloc in the comment ext2: code cleanup by removing ifdef macro surrounding ext2: Fix i_op setting for special inode
2020-06-04Merge tag 'fsnotify_for_v5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Several smaller fixes and cleanups for fsnotify subsystem" * tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix ignore mask logic for events on child and on dir fanotify: don't write with size under sizeof(response) fsnotify: Remove proc_fs.h include fanotify: remove reference to fill_event_metadata() fsnotify: add mutex destroy fanotify: prefix should_merge() fanotify: Replace zero-length array with flexible-array inotify: Fix error return code assignment flow. fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()
2020-06-04Merge tag 'zonefs-5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs update from Damien Le Moal: "Only one patch in this pull request to cleanup handling of uuid using the import_uuid() helper, from Andy" * tag 'zonefs-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Replace uuid_copy() with import_uuid()
2020-06-04cifs: update internal module version numberSteve French
To 2.27 Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-04cifs: multichannel: try to rebind when reconnecting a channelAurelien Aptel
first steps in trying to make channels properly reconnect. * add cifs_ses_find_chan() function to find the enclosing cifs_chan struct it belongs to * while we have the session lock and are redoing negprot and sess.setup in smb2_reconnect() redo the binding of channels. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-04cifs: multichannel: use pointer for binding channelAurelien Aptel
Add a cifs_chan pointer in struct cifs_ses that points to the channel currently being bound if ses->binding is true. Previously it was always the channel past the established count. This will make reconnecting (and rebinding) a channel easier later on. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-04smb3: remove static checker warningSteve French
Remove static checker warning pointed out by Dan Carpenter: The patch feeaec621c09: "cifs: multichannel: move channel selection above transport layer" from Apr 24, 2020, leads to the following static checker warning: fs/cifs/smb2pdu.c:149 smb2_hdr_assemble() error: we previously assumed 'tcon->ses' could be null (see line 133) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> CC: Aurelien Aptel <aptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-04cifs: multichannel: move channel selection above transport layerAurelien Aptel
Move the channel (TCP_Server_Info*) selection from the tranport layer to higher in the call stack so that: - credit handling is done with the server that will actually be used to send. * ->wait_mtu_credit * ->set_credits / set_credits * ->add_credits / add_credits * add_credits_and_wake_if - potential reconnection (smb2_reconnect) done when initializing a request is checked and done with the server that will actually be used to send. To do this: - remove the cifs_pick_channel() call out of compound_send_recv() - select channel and pass it down by adding a cifs_pick_channel(ses) call in: - smb311_posix_mkdir - SMB2_open - SMB2_ioctl - __SMB2_close - query_info - SMB2_change_notify - SMB2_flush - smb2_async_readv (if none provided in context param) - SMB2_read (if none provided in context param) - smb2_async_writev (if none provided in context param) - SMB2_write (if none provided in context param) - SMB2_query_directory - send_set_info - SMB2_oplock_break - SMB311_posix_qfs_info - SMB2_QFS_info - SMB2_QFS_attr - smb2_lockv - SMB2_lease_break - smb2_compound_op - smb2_set_ea - smb2_ioctl_query_info - smb2_query_dir_first - smb2_query_info_comound - smb2_query_symlink - cifs_writepages - cifs_write_from_iter - cifs_send_async_read - cifs_read - cifs_readpages - add TCP_Server_Info *server param argument to: - cifs_send_recv - compound_send_recv - SMB2_open_init - SMB2_query_info_init - SMB2_set_info_init - SMB2_close_init - SMB2_ioctl_init - smb2_iotcl_req_init - SMB2_query_directory_init - SMB2_notify_init - SMB2_flush_init - build_qfs_info_req - smb2_hdr_assemble - smb2_reconnect - fill_small_buf - smb2_plain_req_init - __smb2_plain_req_init The read/write codepath is different than the rest as it is using pages, io iterators and async calls. To deal with those we add a server pointer in the cifs_writedata/cifs_readdata/cifs_io_parms context struct and set it in: - cifs_writepages (wdata) - cifs_write_from_iter (wdata) - cifs_readpages (rdata) - cifs_send_async_read (rdata) The [rw]data->server pointer is eventually copied to cifs_io_parms->server to pass it down to SMB2_read/SMB2_write. If SMB2_read/SMB2_write is called from a different place that doesn't set the server field it will pick a channel. Some places do not pick a channel and just use ses->server or cifs_ses_server(ses). All cifs_ses_server(ses) calls are in codepaths involving negprot/sess.setup. - SMB2_negotiate (binding channel) - SMB2_sess_alloc_buffer (binding channel) - SMB2_echo (uses provided one) - SMB2_logoff (uses master) - SMB2_tdis (uses master) (list not exhaustive) Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-04cifs: multichannel: always zero struct cifs_io_parmsAurelien Aptel
SMB2_read/SMB2_write check and use cifs_io_parms->server, which might be uninitialized memory. This change makes all callers zero-initialize the struct. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-04cifs: dump Security Type info in DebugDataKenneth D'souza
Currently the end user is unaware with what sec type the cifs share is mounted if no sec=<type> option is parsed. With this patch one can easily check from DebugData. Example: 1) Name: x.x.x.x Uses: 1 Capability: 0x8001f3fc Session Status: 1 Security type: RawNTLMSSP Signed-off-by: Kenneth D'souza <kdsouza@redhat.com> Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Aurelien Aptel <aaptel@suse.com>
2020-06-04afs: Adjust the fileserver rotation algorithm to reprobe/retry more quicklyDavid Howells
Adjust the fileserver rotation algorithm so that if we've tried all the addresses on a server (cumulatively over multiple operations) until we've run out of untried addresses, immediately reprobe all that server's interfaces and retry the op at least once before we move onto the next server. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Show more a bit more server state in /proc/net/afs/serversDavid Howells
Display more information about the state of a server record, including the flags, rtt and break counter plus the probe state for each server in /proc/net/afs/servers. Rearrange the server flags a bit to make them easier to read at a glance in the proc file. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Don't use probe running state to make decisions outside probe codeDavid Howells
Don't use the running state for fileserver probes to make decisions about which server to use as the state is cleared at the start of a probe and also intermediate values might be misleading. Instead, add a separate 'latest known' rtt in the afs_server struct and a flag to indicate if the server is known to be responding and update these as and when we know what to change them to. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Fix afs_statfs() to not let the values go below zeroDavid Howells
Fix afs_statfs() so that the value for f_bavail and f_bfree don't go "negative" if the number of blocks in use by a volume exceeds the max quota for that volume. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Fix the by-UUID server tree to allow servers with the same UUIDDavid Howells
Whilst it shouldn't happen, it is possible for multiple fileservers to share a UUID, particularly if an entire cell has been duplicated, UUIDs and all. In such a case, it's not necessarily possible to map the effect of the CB.InitCallBackState3 incoming RPC to a specific server unambiguously by UUID and thus to a specific cell. Indeed, there's a problem whereby multiple server records may need to occupy the same spot in the rb_tree rooted in the afs_net struct. Fix this by allowing servers to form a list, with the head of the list in the tree. When the front entry in the list is removed, the second in the list just replaces it. afs_init_callback_state() then just goes down the line, poking each server in the list. This means that some servers will be unnecessarily poked, unfortunately. An alternative would be to route by call parameters. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
2020-06-04afs: Reorganise volume and server trees to be rooted on the cellDavid Howells
Reorganise afs_volume objects such that they're in a tree keyed on volume ID, rooted at on an afs_cell object rather than being in multiple trees, each of which is rooted on an afs_server object. afs_server structs become per-cell and acquire a pointer to the cell. The process of breaking a callback then starts with finding the server by its network address, following that to the cell and then looking up each volume ID in the volume tree. This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web and allows those structs and the code for maintaining them to be simplified or removed. It does make a couple of things a bit more tricky, though: (1) Operations now start with a volume, not a server, so there can be more than one answer as to whether or not the server we'll end up using supports the FS.InlineBulkStatus RPC. (2) CB RPC operations that specify the server UUID. There's still a tree of servers by UUID on the afs_net struct, but the UUIDs in it aren't guaranteed unique. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Add a tracepoint to track the lifetime of the afs_volume structDavid Howells
Add a tracepoint to track the lifetime of the afs_volume struct. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Detect cell aliases 3 - YFS Cells with a canonical cell name opDavid Howells
YFS Volume Location servers have an operation by which the cell name may be queried. Use this to find out what a YFS server thinks the canonical cell name should be. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Detect cell aliases 2 - Cells with no root volumesDavid Howells
Implement the second phase of cell alias detection. This part handles alias detection for cells that don't have root.cell volumes and so we have to find some other volume or fileserver to query. We take the first volume from each such cell and attempt to look it up in the new cell. If found, we compare the records, if they are the same, we judge the cell names to be aliases. Signed-off-by: David Howells <dhowells@redhat.com>