diff options
author | Linus Torvalds | 2016-05-17 11:01:31 -0700 |
---|---|---|
committer | Linus Torvalds | 2016-05-17 11:01:31 -0700 |
commit | 7f427d3a6029331304f91ef4d7cf646f054216d2 (patch) | |
tree | 61c4a7b9b0ec387da0536324cc2c07b2427b9b46 /include | |
parent | ede40902cf80714ece199977b308e2ee437cae0b (diff) | |
parent | 0e0162bb8c008fa7742f69d4d4982c8a37b88f95 (diff) |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull parallel filesystem directory handling update from Al Viro.
This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.
The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.
The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).
A more detailed explanation of the process from Al Viro:
"The xattr work starts with some acl fixes, then switches ->getxattr to
passing inode and dentry separately. This is the point where the
things start to get tricky - that got merged into the very beginning
of the -rc3-based #work.lookups, to allow untangling the
security_d_instantiate() mess. The xattr work itself proceeds to
switch a lot of filesystems to generic_...xattr(); no complications
there.
After that initial xattr work, the series then does the following:
- untangle security_d_instantiate()
- convert a bunch of open-coded lookup_one_len_unlocked() to calls of
that thing; one such place (in overlayfs) actually yields a trivial
conflict with overlayfs fixes later in the cycle - overlayfs ended
up switching to a variant of lookup_one_len_unlocked() sans the
permission checks. I would've dropped that commit (it gets
overridden on merge from #ovl-fixes in #for-next; proper resolution
is to use the variant in mainline fs/overlayfs/super.c), but I
didn't want to rebase the damn thing - it was fairly late in the
cycle...
- some filesystems had managed to depend on lookup/lookup exclusion
for *fs-internal* data structures in a way that would break if we
relaxed the VFS exclusion. Fixing hadn't been hard, fortunately.
- core of that series - parallel lookup machinery, replacing
->i_mutex with rwsem, making lookup_slow() take it only shared. At
that point lookups happen in parallel; lookups on the same name
wait for the in-progress one to be done with that dentry.
Surprisingly little code, at that - almost all of it is in
fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
making it use the new primitive and actually switching to locking
shared.
- parallel readdir stuff - first of all, we provide the exclusion on
per-struct file basis, same as we do for read() vs lseek() for
regular files. That takes care of most of the needed exclusion in
readdir/readdir; however, these guys are trickier than lookups, so
I went for switching them one-by-one. To do that, a new method
'->iterate_shared()' is added and filesystems are switched to it
as they are either confirmed to be OK with shared lock on directory
or fixed to be OK with that. I hope to kill the original method
come next cycle (almost all in-tree filesystems are switched
already), but it's still not quite finished.
- several filesystems get switched to parallel readdir. The
interesting part here is dealing with dcache preseeding by readdir;
that needs minor adjustment to be safe with directory locked only
shared.
Most of the filesystems doing that got switched to in those
commits. Important exception: NFS. Turns out that NFS folks, with
their, er, insistence on VFS getting the fuck out of the way of the
Smart Filesystem Code That Knows How And What To Lock(tm) have
grown the locking of their own. They had their own homegrown
rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
is the reader there). Of course, with VFS getting the fuck out of
the way, as requested, the actual smarts of the smart filesystem
code etc. had become exposed...
- do_last/lookup_open/atomic_open cleanups. As the result, open()
without O_CREAT locks the directory only shared. Including the
->atomic_open() case. Backmerge from #for-linus in the middle of
that - atomic_open() fix got brought in.
- then comes NFS switch to saner (VFS-based ;-) locking, killing the
homegrown "lookup and readdir are writers" kinda-sorta rwsem. All
exclusion for sillyunlink/lookup is done by the parallel lookups
mechanism. Exclusion between sillyunlink and rmdir is a real rwsem
now - rmdir being the writer.
Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
now.
- the rest of the series consists of switching a lot of filesystems
to parallel readdir; in a lot of cases ->llseek() gets simplified
as well. One backmerge in there (again, #for-linus - rockridge
fix)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
ext4: switch to ->iterate_shared()
hfs: switch to ->iterate_shared()
hfsplus: switch to ->iterate_shared()
hostfs: switch to ->iterate_shared()
hpfs: switch to ->iterate_shared()
hpfs: handle allocation failures in hpfs_add_pos()
gfs2: switch to ->iterate_shared()
f2fs: switch to ->iterate_shared()
afs: switch to ->iterate_shared()
befs: switch to ->iterate_shared()
befs: constify stuff a bit
isofs: switch to ->iterate_shared()
get_acorn_filename(): deobfuscate a bit
btrfs: switch to ->iterate_shared()
logfs: no need to lock directory in lseek
switch ecryptfs to ->iterate_shared
9p: switch to ->iterate_shared()
fat: switch to ->iterate_shared()
romfs, squashfs: switch to ->iterate_shared()
more trivial ->iterate_shared conversions
...
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/dcache.h | 26 | ||||
-rw-r--r-- | include/linux/file.h | 13 | ||||
-rw-r--r-- | include/linux/fs.h | 51 | ||||
-rw-r--r-- | include/linux/nfs_fs.h | 11 | ||||
-rw-r--r-- | include/linux/nfs_xdr.h | 4 | ||||
-rw-r--r-- | include/linux/posix_acl.h | 1 | ||||
-rw-r--r-- | include/linux/xattr.h | 5 | ||||
-rw-r--r-- | include/trace/events/ext4.h | 6 |
8 files changed, 93 insertions, 24 deletions
diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 7e9422cb5989..f8506e8dd4d4 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -123,7 +123,10 @@ struct dentry { unsigned long d_time; /* used by d_revalidate */ void *d_fsdata; /* fs-specific data */ - struct list_head d_lru; /* LRU list */ + union { + struct list_head d_lru; /* LRU list */ + wait_queue_head_t *d_wait; /* in-lookup ones only */ + }; struct list_head d_child; /* child of parent list */ struct list_head d_subdirs; /* our children */ /* @@ -131,6 +134,7 @@ struct dentry { */ union { struct hlist_node d_alias; /* inode alias list */ + struct hlist_bl_node d_in_lookup_hash; /* only for in-lookup ones */ struct rcu_head d_rcu; } d_u; }; @@ -232,6 +236,8 @@ struct dentry_operations { #define DCACHE_ENCRYPTED_WITH_KEY 0x04000000 /* dir is encrypted with a valid key */ #define DCACHE_OP_REAL 0x08000000 +#define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ + extern seqlock_t rename_lock; /* @@ -248,6 +254,8 @@ extern void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op /* allocate/de-allocate */ extern struct dentry * d_alloc(struct dentry *, const struct qstr *); extern struct dentry * d_alloc_pseudo(struct super_block *, const struct qstr *); +extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *, + wait_queue_head_t *); extern struct dentry * d_splice_alias(struct inode *, struct dentry *); extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *); extern struct dentry * d_exact_alias(struct dentry *, struct inode *); @@ -367,6 +375,22 @@ static inline void dont_mount(struct dentry *dentry) spin_unlock(&dentry->d_lock); } +extern void __d_lookup_done(struct dentry *); + +static inline int d_in_lookup(struct dentry *dentry) +{ + return dentry->d_flags & DCACHE_PAR_LOOKUP; +} + +static inline void d_lookup_done(struct dentry *dentry) +{ + if (unlikely(d_in_lookup(dentry))) { + spin_lock(&dentry->d_lock); + __d_lookup_done(dentry); + spin_unlock(&dentry->d_lock); + } +} + extern void dput(struct dentry *); static inline bool d_managed(const struct dentry *dentry) diff --git a/include/linux/file.h b/include/linux/file.h index f87d30882a24..7444f5feda12 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -44,6 +44,7 @@ extern struct file *fget_raw(unsigned int fd); extern unsigned long __fdget(unsigned int fd); extern unsigned long __fdget_raw(unsigned int fd); extern unsigned long __fdget_pos(unsigned int fd); +extern void __f_unlock_pos(struct file *); static inline struct fd __to_fd(unsigned long v) { @@ -60,6 +61,18 @@ static inline struct fd fdget_raw(unsigned int fd) return __to_fd(__fdget_raw(fd)); } +static inline struct fd fdget_pos(int fd) +{ + return __to_fd(__fdget_pos(fd)); +} + +static inline void fdput_pos(struct fd f) +{ + if (f.flags & FDPUT_POS_UNLOCK) + __f_unlock_pos(f.file); + fdput(f); +} + extern int f_dupfd(unsigned int from, struct file *file, unsigned flags); extern int replace_fd(unsigned fd, struct file *file, unsigned flags); extern void set_close_on_exec(unsigned int fd, int flag); diff --git a/include/linux/fs.h b/include/linux/fs.h index 70e61b58baaf..e87245ac6941 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -577,6 +577,18 @@ static inline void mapping_allow_writable(struct address_space *mapping) struct posix_acl; #define ACL_NOT_CACHED ((void *)(-1)) +static inline struct posix_acl * +uncached_acl_sentinel(struct task_struct *task) +{ + return (void *)task + 1; +} + +static inline bool +is_uncached_acl(struct posix_acl *acl) +{ + return (long)acl & 1; +} + #define IOP_FASTPERM 0x0001 #define IOP_LOOKUP 0x0002 #define IOP_NOFOLLOW 0x0004 @@ -635,7 +647,7 @@ struct inode { /* Misc */ unsigned long i_state; - struct mutex i_mutex; + struct rw_semaphore i_rwsem; unsigned long dirtied_when; /* jiffies of first dirtying */ unsigned long dirtied_time_when; @@ -672,6 +684,7 @@ struct inode { struct block_device *i_bdev; struct cdev *i_cdev; char *i_link; + unsigned i_dir_seq; }; __u32 i_generation; @@ -721,27 +734,42 @@ enum inode_i_mutex_lock_class static inline void inode_lock(struct inode *inode) { - mutex_lock(&inode->i_mutex); + down_write(&inode->i_rwsem); } static inline void inode_unlock(struct inode *inode) { - mutex_unlock(&inode->i_mutex); + up_write(&inode->i_rwsem); +} + +static inline void inode_lock_shared(struct inode *inode) +{ + down_read(&inode->i_rwsem); +} + +static inline void inode_unlock_shared(struct inode *inode) +{ + up_read(&inode->i_rwsem); } static inline int inode_trylock(struct inode *inode) { - return mutex_trylock(&inode->i_mutex); + return down_write_trylock(&inode->i_rwsem); +} + +static inline int inode_trylock_shared(struct inode *inode) +{ + return down_read_trylock(&inode->i_rwsem); } static inline int inode_is_locked(struct inode *inode) { - return mutex_is_locked(&inode->i_mutex); + return rwsem_is_locked(&inode->i_rwsem); } static inline void inode_lock_nested(struct inode *inode, unsigned subclass) { - mutex_lock_nested(&inode->i_mutex, subclass); + down_write_nested(&inode->i_rwsem, subclass); } void lock_two_nondirectories(struct inode *, struct inode*); @@ -1646,6 +1674,7 @@ struct file_operations { ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*iterate) (struct file *, struct dir_context *); + int (*iterate_shared) (struct file *, struct dir_context *); unsigned int (*poll) (struct file *, struct poll_table_struct *); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); @@ -1700,7 +1729,8 @@ struct inode_operations { int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); - ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); + ssize_t (*getxattr) (struct dentry *, struct inode *, + const char *, void *, size_t); ssize_t (*listxattr) (struct dentry *, char *, size_t); int (*removexattr) (struct dentry *, const char *); int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, @@ -3104,6 +3134,13 @@ static inline bool dir_relax(struct inode *inode) return !IS_DEADDIR(inode); } +static inline bool dir_relax_shared(struct inode *inode) +{ + inode_unlock_shared(inode); + inode_lock_shared(inode); + return !IS_DEADDIR(inode); +} + extern bool path_noexec(const struct path *path); extern void inode_nohighmem(struct inode *inode); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 67300f8e5f2f..fa167f25465d 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -163,11 +163,9 @@ struct nfs_inode { /* Open contexts for shared mmap writes */ struct list_head open_files; - /* Number of in-flight sillydelete RPC calls */ - atomic_t silly_count; - /* List of deferred sillydelete requests */ - struct hlist_head silly_list; - wait_queue_head_t waitqueue; + /* Readers: in-flight sillydelete RPC calls */ + /* Writers: rmdir */ + struct rw_semaphore rmdir_sem; #if IS_ENABLED(CONFIG_NFS_V4) struct nfs4_cached_acl *nfs4_acl; @@ -492,9 +490,6 @@ extern void nfs_release_automount_timer(void); * linux/fs/nfs/unlink.c */ extern void nfs_complete_unlink(struct dentry *dentry, struct inode *); -extern void nfs_wait_on_sillyrename(struct dentry *dentry); -extern void nfs_block_sillyrename(struct dentry *dentry); -extern void nfs_unblock_sillyrename(struct dentry *dentry); /* * linux/fs/nfs/write.c diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index d320906cf13e..ee8491dadbf3 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1468,10 +1468,10 @@ struct nfs_pgio_completion_ops { }; struct nfs_unlinkdata { - struct hlist_node list; struct nfs_removeargs args; struct nfs_removeres res; - struct inode *dir; + struct dentry *dentry; + wait_queue_head_t wq; struct rpc_cred *cred; struct nfs_fattr dir_attr; long timeout; diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index 3e96a6a76103..5b5a80cc5926 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -99,7 +99,6 @@ extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **, extern int simple_set_acl(struct inode *, struct posix_acl *, int); extern int simple_acl_create(struct inode *, struct inode *); -struct posix_acl **acl_by_type(struct inode *inode, int type); struct posix_acl *get_cached_acl(struct inode *inode, int type); struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type); void set_cached_acl(struct inode *inode, int type, struct posix_acl *acl); diff --git a/include/linux/xattr.h b/include/linux/xattr.h index 4457541de3c9..1cc4c578deb9 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -30,7 +30,8 @@ struct xattr_handler { int flags; /* fs private flags */ bool (*list)(struct dentry *dentry); int (*get)(const struct xattr_handler *, struct dentry *dentry, - const char *name, void *buffer, size_t size); + struct inode *inode, const char *name, void *buffer, + size_t size); int (*set)(const struct xattr_handler *, struct dentry *dentry, const char *name, const void *buffer, size_t size, int flags); @@ -51,7 +52,7 @@ int __vfs_setxattr_noperm(struct dentry *, const char *, const void *, size_t, i int vfs_setxattr(struct dentry *, const char *, const void *, size_t, int); int vfs_removexattr(struct dentry *, const char *); -ssize_t generic_getxattr(struct dentry *dentry, const char *name, void *buffer, size_t size); +ssize_t generic_getxattr(struct dentry *dentry, struct inode *inode, const char *name, void *buffer, size_t size); ssize_t generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size); int generic_setxattr(struct dentry *dentry, const char *name, const void *value, size_t size, int flags); int generic_removexattr(struct dentry *dentry, const char *name); diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 4e4b2fa78609..09c71e9aaebf 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -872,7 +872,7 @@ TRACE_EVENT(ext4_sync_file_enter, TP_fast_assign( struct dentry *dentry = file->f_path.dentry; - __entry->dev = d_inode(dentry)->i_sb->s_dev; + __entry->dev = dentry->d_sb->s_dev; __entry->ino = d_inode(dentry)->i_ino; __entry->datasync = datasync; __entry->parent = d_inode(dentry->d_parent)->i_ino; @@ -1451,7 +1451,7 @@ TRACE_EVENT(ext4_unlink_enter, ), TP_fast_assign( - __entry->dev = d_inode(dentry)->i_sb->s_dev; + __entry->dev = dentry->d_sb->s_dev; __entry->ino = d_inode(dentry)->i_ino; __entry->parent = parent->i_ino; __entry->size = d_inode(dentry)->i_size; @@ -1475,7 +1475,7 @@ TRACE_EVENT(ext4_unlink_exit, ), TP_fast_assign( - __entry->dev = d_inode(dentry)->i_sb->s_dev; + __entry->dev = dentry->d_sb->s_dev; __entry->ino = d_inode(dentry)->i_ino; __entry->ret = ret; ), |