linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2016-05-02	simple local filesystems: switch to ->iterate_shared()	Al Viro
	no changes needed (XFS isn't simple, but it has the same parallelism in the interesting parts exercised from CXFS). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02	make ext2_get_page() and friends work without external serialization	Al Viro
	Right now ext2_get_page() (and its analogues in a bunch of other filesystems) relies upon the directory being locked - the way it sets and tests Checked and Error bits would be racy without that. Switch to a slightly different scheme, _not_ setting Checked in case of failure. That way the logics becomes if Checked => OK else if Error => fail else if !validate => fail else => OK with validation setting Checked or Error on success and failure resp. and returning which one had happened. Equivalent to the current logics, but unlike the current logics not sensitive to the order of set_bit, test_bit getting reordered by CPU, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02	Merge getxattr prototype change into work.lookups	Al Viro
	The rest of work.xattr stuff isn't needed for this branch
2016-04-10	don't bother with ->d_inode->i_sb - it's always equal to ->d_sb	Al Viro
	... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-04	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	Kirill A. Shutemov
	PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	kmemcg: account certain kmem allocations to memcg	Vladimir Davydov
	Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-08	don't put symlink bodies in pagecache into highmem	Al Viro
	kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06	ufs: get rid of ->setattr() for symlinks	Al Viro
	It was to needed for a couple of months in 2010, until UFS quota support got dropped. Since then it's equivalent to simple_setattr() (i.e. the default) for everything except the regular files. And dropping it there allows to convert all UFS symlinks to {page,simple}_symlink_inode_operations, getting rid of fs/ufs/symlink.c completely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-09-09	fix ufs write vs readpage race when writing into a hole	Al Viro
	Followup to the UFS series - with the way we clear the new blocks (via buffer cache, possibly on more than a page worth of file) we really should not insert a reference to new block into inode block tree until after we'd cleared it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-06	ufs_inode_get{frag,block}(): get rid of 'phys' argument	Al Viro
	Just pass NULL as locked_page in case of first block in the indirect chain. Old calling conventions aside, a reason for having 'phys' was that ufs_inode_getfrag() used to be able to do _two_ allocations - indirect block and extending/reallocating a tail. We needed locked_page for the latter (it's a data), but we also needed to figure out that indirect block is metadata. So we used to pass non-NULL locked_page in all cases and used NULL phys as indication of being asked to allocate an indirect. With tail unpacking taken into a separate function we don't need those convolutions anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_getfrag_block(): tidy up a bit	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_getblock(): failure to read an indirect block is -EIO	Al Viro
	... and not "write to beginning of the disk", TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_getfrag_block(): turn following indirects into a loop	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_getfrag(): pass index instead of 'fragment'	Al Viro
	same story as with ufs_inode_getblock() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_getfrag(): split extending the partial blocks off	Al Viro
	ufs_extend_tail() is handling that now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_getblock(): pass indirect block number and full index	Al Viro
	... instead of messing with buffer_head. We can bloody well do sb_bread() in there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_getblock(): pass index instead of 'fragment'	Al Viro
	The value passed to ufs_inode_getblock() as the 3rd argument had lower bits ignored; the upper bits were shifted down and used and they actually make sense - those are _lower_ bits of index in indirect block (i.e. they form the index within a fragment within an indirect block). Pass those as argument. Upper bits of index (i.e. the number of fragment within indirect block) will join them shortly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_get{frag,block}(): leave sb_getblk() to caller	Al Viro
	just return the damn block number Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_getfrag_block(): get rid of macro jungles	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_get{frag,block}(): consolidate success exits	Al Viro
	These calling conventions are rudiments of pre-2.3 times; they really need to be sanitized. This is the first step; next will be _always_ returning a block number, instead of this "return a pointer to buffer_head, except when we get to the actual data" crap. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: use the branch depth in ufs_getfrag_block()	Al Viro
	we'd already calculated it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: move calculation of offsets into ufs_getfrag_block()	Al Viro
	... and massage ufs_frag_map() to take those instead of fragment number. As it is, we duplicate the damn thing on the write side, open-coded and bloody hard to follow. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_inode_get{frag,block}(): get rid of retries	Al Viro
	We are holding ->truncate_mutex, so nobody else can alter our block pointers. Rechecks/retries were needed back when we only held BKL there, and had to cope with write_begin/writepage and writepage/truncate races. Can't happen anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	__ufs_truncate_blocks(): avoid excessive dirtying of indirect blocks	Al Viro
	There's a case when an indirect block gets dirtied for no good reason - when there's a hole starting in the middle of area covered by it and spanning past its end, and truncate() is done precisely to the beginning of the hole. The block is obviously not modified at all - all removals happen beyond it. However, existing code ends up dirtying it just in case. It's trivial to fix and while it's not a real bug by any stretch of imagination, it makes the damn thing harder to follow. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	free_full_branch(): don't bother modifying the block we are going to free	Al Viro
	Note that it's already made unreachable from the inode, so we don't have to worry about ufs_frag_map() walking into something already freed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	move marking inode dirty to the end of __ufs_truncate_blocks()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	free_full_branch(): saner calling conventions	Al Viro
	Have caller fetch the block number and remove it from wherever it was. Pass the block number instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_branch(): kill recursion	Al Viro
	turn recursion into a pair of loops Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_branch(): massage towards killing recursion	Al Viro
	We always have 0 < depth2 <= depth in there, so if (--depth) { if (--depth2) A B } else { C // not using depth2 } D // not using depth2 is equivalent to if (--depth2) A with s/depth/depth - 1/ if (--depth) B else C D Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	split ufs_truncate_branch() into full- and partial-branch variants	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: unify the logics for collecting adjacent data blocks to free	Al Viro
	open-coded in several places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_branch(): separate the calls with non-NULL offsets	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_branch(): never call with offsets != NULL && depth2 == 0	Al Viro
	For calls in __ufs_truncate_blocks() it's just a matter of not incrementing offsets[0] and not making that call - immediately following loop will be executed one extra time and we'll be just fine. For recursive call in ufs_trunc_branch() itself, just assing NULL to offsets if we would be about to make such call. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	__ufs_trunc_blocks(): turn the part after switch into a loop	Al Viro
	... and turn the switch into if (), since all cases with depth != 1 have just become identical. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	__ufs_truncate_blocks(): unify freeing the full branches	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	unify ufs_trunc_..indirect()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_..indirect(): more massage towards unifying	Al Viro
	Instead of manually checking that the array contains only zeroes, find the position of the last non-zero (in __ufs_truncate(), where we can conveniently do that) and use that to tell if there's any non-zero in the array tail passed to ufs_trunc_...indirect(). The goal of all that clumsiness is to get fold these functions together. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_...indirect(): pass the array of indices instead of offsets	Al Viro
	rather than bitslicing the offset just formed as sum of shifted indices, pass the array of those indices itself. NULL is used as equivalent of "all zeroes" (== free the entire branch). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	__ufs_truncate(); find cutoff distances into branches by offsets[] array	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_dindirect(): pass the number of blocks to keep	Al Viro
	same as the previous two. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_indirect(): pass the index of the first pointer to free	Al Viro
	... instead of file offset. Same cleanups as in the tindirect conversion in previous commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs_trunc_tindirect(): pass the number of blocks to keep	Al Viro
	IOW, the distance of cutoff from the begining of the branch (in blocks). That (and the fact that block just prior to cutoff is guaranteed to be present) allows to tell whether to free triple indirect block just by looking at the offset. While we are at it, using u64 for index in the block is wrong - those should be unsigned int. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: beginning of __ufs_truncate_block() massage	Al Viro
	Use ufs_block_to_path() to find the cutoff path in the block pointers' tree. For now just use the information about the depth (to bypass the fully preserved subtrees); subsequent commits will use the information about actual path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: the offsets ufs_block_to_path() puts into array are not sector_t	Al Viro
	type makes no sense - those are indices in block number arrays, not block numbers. And no, UFS is not likely to grow indirect blocks with 4Gpointers in them... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: move truncate code into inode.c	Al Viro
	It is closely tied to block pointers handling there, can benefit from existing helpers, etc. - no point keeping them apart. Trimmed the trailing whitespaces in inode.c at the same time. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: no retries are needed on truncate	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: ufs_trunc_...() has exclusion with everything that might cause allocations	Al Viro
	Currently - on lock_ufs(), eventually - on per-inode mutex. lock_ufs() used to be mere BKL, which is much weaker, so it needed those rechecks. BKL doesn't provide any exclusion once we lose CPU; its blind replacement, OTOH, _does_. Making that per-filesystem was an atrocity, but at least we can simplify life here. And yes, we certainly need to make that sucker per-inode - these days inode.c and truncate.c uses are needed only to protect the block pointers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: ufs_trunc_direct() always returns 0	Al Viro
	make it return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: kill lock_ufs()	Al Viro
	There were 3 remaining users; in two of them we took ->s_lock immediately after lock_ufs() and held it until just before unlock_ufs(); the third one (statfs) could not be called from itself or from other two (remount and sync_fs). Just use ->s_lock in statfs and don't bother with lock_ufs at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06	ufs: don't use lock_ufs() for block pointers tree protection	Al Viro
	* stores to block pointers are under per-inode seqlock (meta_lock) and mutex (truncate_mutex) * fetches of block pointers are either under truncate_mutex, or wrapped into seqretry loop on meta_lock * all changes of ->i_size are under truncate_mutex and i_mutex * all changes of ->i_lastfrag are under truncate_mutex It's similar to what ext2 is doing; the main difference is that unlike ext2 we can't rely upon the atomicity of stores into block pointers - on UFS2 they are 64bit. So we can't cut the corner when switching a pointer from NULL to non-NULL as we could in ext2_splice_branch() and need to use meta_lock on all modifications. We use seqlock where ext2 uses rwlock; ext2 could probably also benefit from such change... Another non-trivial difference is that with UFS we cannot have reader grab truncate_mutex in case of race - it has to keep retrying. That might be possible to change, but not until we lift tail unpacking several levels up in call chain. After that commit we do NOT hold fs-wide serialization on accesses to block pointers anymore. Moreover, lock_ufs() can become a normal mutex now - it's only used on statfs, remount and sync_fs and none of those uses are recursive. As the matter of fact, now it can be collapsed with ->s_lock, and be eventually replaced with saner per-cylinder-group spinlocks, but that's a separate story. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>