aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-06-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs compile warning fixes from Chris Mason. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: cast devid to unsigned long long for printk %llu Btrfs: init old_generation in get_old_root
2012-06-15Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Fix a couple of mount regressions due to the recent cleanups. - Fix an Oops in the open recovery code - Fix an rpc_pipefs upcall hang that results from some of the net namespace work from 3.4.x (stable kernel candidate). - Fix a couple of write and o_direct regressions that were found at last weeks Bakeathon testing event in Ann Arbor." * tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: add an endian notation for sparse NFSv4.1: integer overflow in decode_cb_sequence_args() rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer NFSv4 do not send an empty SETATTR compound NFSv2: EOF incorrectly set on short read NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts NFS: fix directio refcount bug on commit NFSv4: Fix unnecessary delegation returns in nfs4_do_open NFSv4.1: Convert another trivial printk into a dprintk NFS4: Fix open bug when pnfs module blacklisted NFS: Remove incorrect BUG_ON in nfs_found_client NFS: Map minor mismatch error to protocol not support error. NFS: Fix a commit bug NFS4: Set parsed mount data version to 4 NFSv4.1: Ensure we clear session state flags after a session creation NFSv4.1: Convert a trivial printk into a dprintk NFSv4: Fix up decode_attr_mdsthreshold NFSv4: Fix an Oops in the open recovery code NFSv4.1: Fix a request leak on the back channel
2012-06-15Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull two nfsd bugfixes from J. Bruce Fields. * 'for-3.5' of git://linux-nfs.org/~bfields/linux: nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels NFS: hard-code init_net for NFS callback transports
2012-06-15Btrfs: cast devid to unsigned long long for printk %lluChris Mason
Avoid warning in 32 bit machines Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: init old_generation in get_old_rootChris Mason
gcc was giving an uninit variable warning here. Strictly speaking we don't need to init it, but this will make things much less error prone. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "The dates look like I had to rebase this morning because there was a compiler warning for a printk arg that I had missed earlier. These are all fixes, including one to prevent using stale pointers for device names, and lots of fixes around transaction abort cleanups (Josef, Liu Bo). Jan Schmidt also sent in a number of fixes for the new reference number tracking code. Liu Bo beat me to updating the MAINTAINERS file. Since he thought to also fix the git url, I kept his commit." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: update MAINTAINERS info for BTRFS FILE SYSTEM Btrfs: destroy the items of the delayed inodes in error handling routine Btrfs: make sure that we've made everything in pinned tree clean Btrfs: avoid memory leak of extent state in error handling routine Btrfs: do not resize a seeding device Btrfs: fix missing inherited flag in rename Btrfs: fix incompat flags setting Btrfs: fix defrag regression Btrfs: call filemap_fdatawrite twice for compression Btrfs: keep inode pinned when compressing writes Btrfs: implement ->show_devname Btrfs: use rcu to protect device->name Btrfs: unlock everything properly in the error case for nocow Btrfs: fix btrfs_destroy_marked_extents Btrfs: abort the transaction if the commit fails Btrfs: wake up transaction waiters when aborting a transaction Btrfs: fix locking in btrfs_destroy_delayed_refs Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error Btrfs: fix race in tree mod log addition Btrfs: add btrfs_next_old_leaf ...
2012-06-15Btrfs: destroy the items of the delayed inodes in error handling routineMiao Xie
the items of the delayed inodes were forgotten to be freed, this patch fixes it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: make sure that we've made everything in pinned tree cleanLiu Bo
Since we have two trees for recording pinned extents, we need to go through both of them to make sure that we've done everything clean. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: avoid memory leak of extent state in error handling routineLiu Bo
We've forgotten to clear extent states in pinned tree, which will results in space counter mismatch and memory leak: WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]() ... space_info 2 has 8380416 free, is not full space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304 btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1 ... Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: do not resize a seeding deviceLiu Bo
Seeding devices are not supposed to change any more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: fix missing inherited flag in renameLiu Bo
When we move a file into a directory with compression flag, we need to inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well. But if we move a file into a directory without compression flag, we need to clear both of them. It is the way how our setflags deals with compression flag, so keep the same behaviour here. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason
for-linus
2012-06-14Btrfs: fix incompat flags settingLi Zefan
It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which has only one bit set. Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14Btrfs: fix defrag regressionLi Zefan
If a file has 3 small extents: | ext1 | ext2 | ext3 | Running "btrfs fi defrag" will only defrag the last two extents, if those extent mappings hasn't been read into memory from disk. This bug was introduced by commit 17ce6ef8d731af5edac8c39e806db4c7e1f6956f ("Btrfs: add a check to decide if we should defrag the range") The cause is, that commit looked into previous and next extents using lookup_extent_mapping() only. While at it, remove the code that checks the previous extent, since it's sufficient to check the next extent. Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14Btrfs: call filemap_fdatawrite twice for compressionJosef Bacik
I removed this in an earlier commit and I was wrong. Because compression can return from filemap_fdatawrite() without having actually set any of it's pages as writeback() it can make filemap_fdatawait() do essentially nothing, and then we won't find any ordered extents because they may not have been created yet. So not only does this make fsync() completely useless, but it will also screw up if you truncate on a non-page aligned offset since we zero out the end and then wait on ordered extents and then call drop caches. We can drop the cache before the io completes and then we try to unpin the extent we just wrote we won't find it and everything goes sideways. So fix this by putting it back and put a giant comment there to keep me from trying to remove it in the future. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: keep inode pinned when compressing writesJosef Bacik
A user reported lots of problems using compression on the new code and it turns out part of the problem was that igrab() was failing when we added a new ordered extent. This is because when writing out an inode under compression we immediately return without actually doing anything to the pages, and then in another thread at some point down the line actually do the ordered dance. The problem is between the point that we start writeback and we actually add the ordered extent we could be trying to reclaim the inode, which makes igrab() return NULL. So we need to do an igrab() when we create the async extent and then drop it when we are done with it. This makes sure we stay pinned in memory until the ordered extent can get a reference on it and we are good to go. With this patch we no longer panic in btrfs_finish_ordered_io(). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: implement ->show_devnameJosef Bacik
Because btrfs can remove the device that was mounted we need to have a ->show_devname so that in this case we can print out some other device in the file system to /proc/mount. So if there are multiple devices in a btrfs file system we will just print the device with the lowest devid that we can find. This will make everything consistent and deal with device removal properly. The drawback is if you mount with a device that is higher than the lowest devicd it won't show up as the mounted device in /proc/mounts, but this is a small price to pay. This was inspired by Miao Xie's patch. Thanks, Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: use rcu to protect device->nameJosef Bacik
Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: unlock everything properly in the error case for nocowJosef Bacik
I was getting hung on umount when a transaction was aborted because a range of one of the free space inodes was still locked. This is because the nocow stuff doesn't unlock anything on error. This fixed the problem and I verified that is what was happening. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: fix btrfs_destroy_marked_extentsJosef Bacik
So we're forcing the eb's to have their ref count set to 1 so invalidatepage works but this breaks lots of things, for example root nodes, and is just plain wrong, we don't need to just evict all of this stuff. Also drop the invalidatepage altogether and add a page_cache_release(). With this patch we no longer hang when trying to access the root nodes after an aborted transaction and we no longer leak memory. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: abort the transaction if the commit failsJosef Bacik
If a transaction commit fails we don't abort it so we don't set an error on the file system. This patch fixes that by actually calling the abort stuff and then adding a check for a fs error in the transaction start stuff to make sure it is caught properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: wake up transaction waiters when aborting a transactionJosef Bacik
I was getting lots of hung tasks and a NULL pointer dereference because we are not cleaning up the transaction properly when it aborts. First we need to reset the running_transaction to NULL so we don't get a bad dereference for any start_transaction callers after this. Also we cannot rely on waitqueue_active() since it's just a list_empty(), so just call wake_up() directly since that will do the barrier for us and such. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: fix locking in btrfs_destroy_delayed_refsJosef Bacik
The transaction abort stuff was throwing warnings from the list debugging code because we do a list_del_init outside of the delayed_refs spin lock. The delayed refs locking makes baby Jesus cry so it's not hard to get wrong, but we need to take the ref head mutex to make sure it's not being processed currently, and so if it is we need to drop the spin lock and then take and drop the mutex and do the search again. If we can take the mutex then we can safely remove the head from the list and carry on. Now when the transaction aborts I don't get the list debugging warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an errorJosef Bacik
While doing my enospc work I got a transaction abortion that resulted in a panic when we tried to unlock_page() an already unlocked page. This is because we aren't calling extent_clear_unlock_delalloc with the locked page so it was unlocking all the pages in the range. This is wrong since __extent_writepage expects to have the page locked still unless we return *page_started as 1. This should keep us from panicing. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernelsJ. Bruce Fields
Most frequent symptom was a BUG triggering in expire_client, with the server locking up shortly thereafter. Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41: free_session/free_client must be called under the client_lock". Cc: stable@kernel.org Cc: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14NFS: hard-code init_net for NFS callback transportsStanislav Kinsbursky
In case of destroying mount namespace on child reaper exit, nsproxy is zeroed to the point already. So, dereferencing of it is invalid. This patch hard-code "init_net" for all network namespace references for NFS callback services. This will be fixed with proper NFS callback containerization. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14Btrfs: fix race in tree mod log additionJan Schmidt
When adding to the tree modification log, we grab two locks at different stages. We must not drop the outer lock until we're done with section protected by the inner lock. This moves the unlock call for the outer lock to the appropriate position. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14Btrfs: add btrfs_next_old_leafJan Schmidt
To make sense of the tree mod log, the backref walker not only needs btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was calling btrfs_search_slot. This obviously didn't give the correct result. This commit adds btrfs_next_old_leaf, a drop-in replacement for btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot with this time_seq parameter. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14Btrfs: fix return value for __tree_mod_log_oldest_rootJan Schmidt
In __tree_mod_log_oldest_root() we must return the found operation even if it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there are no operations to be rewinded and returns immediately. The code in the caller is modified to improve readability. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14Btrfs: use btrfs_read_lock_root_node in get_old_rootJan Schmidt
get_old_root could race with root node updates because we weren't locking the node early enough. Use btrfs_read_lock_root_node to grab the root locked in the very beginning and release the lock as soon as possible (just like btrfs_search_slot does). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_refJan Schmidt
When resolving indirect refs, we used to call btrfs_next_leaf in case we didn't find an exact match. While we should find exact matches most of the time, in case we don't, we must continue searching. Treating those matches differently depending on the level we're searching doesn't make sense. Even worse, we might end up searching for a key larger than the largest, in which case there is no next_leaf and subsequent jobs would fail. This commit drops the bogous lines. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-12Merge tag 'writeback-lock-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback locking fix from Wu Fengguang: "fix unbalanced wb->list_lock in 3.5-rc1" * tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Fix lock imbalance in writeback_sb_inodes()
2012-06-12NFS: add an endian notation for sparseDan Carpenter
This is supposed to be a __be32 value. Sparse complains a lot: fs/nfs/callback_xdr.c:699:30: warning: incorrect type in initializer (different base types) fs/nfs/callback_xdr.c:699:30: expected unsigned int [unsigned] status fs/nfs/callback_xdr.c:699:30: got restricted __be32 const [usertype] csr_status fs/nfs/callback_xdr.c:715:9: warning: cast to restricted __be32 fs/nfs/callback_xdr.c:716:16: warning: incorrect type in return expression (different base types) fs/nfs/callback_xdr.c:716:16: expected restricted __be32 fs/nfs/callback_xdr.c:716:16: got unsigned int [unsigned] status Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12NFSv4.1: integer overflow in decode_cb_sequence_args()Dan Carpenter
This seems like it could overflow on 32 bits. Use kmalloc_array() which has overflow protection built in. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12exofs: fix sparse non-ANSI function warningRandy Dunlap
Fix sparse non-ANSI function warning: fs/exofs/sys.c:112:28: warning: non-ANSI function declaration of function 'exofs_sysfs_dbg_print' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-11NFSv4 do not send an empty SETATTR compoundAndy Adamson
Commit 536e43d12b9517bbbf6114cd1a12be27857a4d7a ATTR_OPEN check can result in an ia_valid with only ATTR_FILE set, and no NFS_VALID_ATTRS attributes to request from the server. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11NFSv2: EOF incorrectly set on short readSachin Prabhu
In cases where the server returns fewer bytes then those requested, we can incorrectly set the eof flag for the file. Fixing this allows the request to be retried with updated offset and count arguments. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mountsBryan Schumaker
Older versions of nfs utils don't always pass a "vers=" mount option for NFS. This chould lead to attempts at using NFS v0 due to a zeroed out nfs_parsed_mount_data struct. I solve this by setting the default NFS version to NFS_DEFAULT_VERSION in the v2 and v3 cases (v4 has already been taken care of by a similar patch). Reported-by: Joerg Roedel <joro@&bytes.org> Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09NFS: fix directio refcount bug on commitFred Isaman
This reverts a hunk from commit 04277086577 "NFS: Clean up - Simplify reference counting in fs/nfs/direct.c" The cleanups in that patch affect the write path, but by the time processing hits commit the removed reference has been added back by nfs_scan_commit_list(). Without this reversion, any page that is sent to commit holds on to an unbalanced reference that is never freed. The immediate effect is an imbalance over the wire between OPENs and CLOSEs. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09writeback: Fix lock imbalance in writeback_sb_inodes()Jan Kara
Fix bug introduced by 169ebd90. We have to have wb_list_lock locked when restarting writeback loop after having waited for inode writeback. Bug description by Ted Tso: I can reproduce this fairly easily by using ext4 w/o a journal, running under KVM with 1024megs memory, with fsstress (xfstests #13): [ 45.153294] ===================================== [ 45.154784] [ BUG: bad unlock balance detected! ] [ 45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted [ 45.155591] ------------------------------------- [ 45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at: [ 45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327 [ 45.155591] but there are no more locks to release! Reported-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-06-08Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Theodore Ts'o: "This update contains two bug fixes, both destined for the stable tree. Perhaps the most important is one which fixes ext4 when used with file systems originally formatted for use with ext3, but then later converted to take advantage of ext4." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: don't set i_flags in EXT4_IOC_SETFLAGS ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
2012-06-08Merge tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS fixes from Artem Bityutskiy: "Fix UBI and UBIFS - they refuse to work without debugfs. This was broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging Kconfig switches. Also, correct locking in 'ubi_wl_flush()' - it was extended to support flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal." * tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs: UBI: correct ubi_wl_flush locking UBIFS: fix debugfs-less systems support UBI: fix debugfs-less systems support
2012-06-08Revert "vfs: stop d_splice_alias creating directory aliases"Linus Torvalds
This reverts commit 7732a557b1342c6e6966efb5f07effcf99f56167 (and commit 3f50fff4dace23d3cfeb195d5cd4ee813cee68b7, which was a follow-up cleanup). We're chasing an elusive bug that Dave Jones can apparently reproduce using his system call fuzzer tool, and that looks like some kind of locking ordering problem on the directory i_mutex chain. Our i_mutex locking is rather complex, and depends on the topological ordering of the directories, which is why we have been very wary of splicing directory entries around. Of course, we really don't want to ever see aliased unconnected directories anyway, so none of this should ever happen, but this revert aims to basically get us back to a known older state. Bruce points to some of the previous discussion at http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk> and in particular a long post from Neil: http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown> It should be noted that it's possible that Dave's problems come from other changes altohgether, including possibly just the fact that Dave constantly is teachning his fuzzer new tricks. So what appears to be a new bug could in fact be an old one that just gets newly triggered, but reverting these patches as "still under heavy discussion" is the right thing regardless. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-08NFSv4: Fix unnecessary delegation returns in nfs4_do_openTrond Myklebust
While nfs4_do_open() expects the fmode argument to be restricted to combinations of FMODE_READ and FMODE_WRITE, both nfs4_atomic_open() and nfs4_proc_create will pass the nfs_open_context->mode, which contains the full fmode_t. This patch ensures that nfs4_do_open strips the other fmode_t bits, fixing a problem in which the nfs4_do_open call would result in an unnecessary delegation return. Reported-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-06-07Revert "mm: correctly synchronize rss-counters at exit/exec"Linus Torvalds
This reverts commit 40af1bbdca47e5c8a2044039bb78ca8fd8b20f94. It's horribly and utterly broken for at least the following reasons: - calling sync_mm_rss() from mmput() is fundamentally wrong, because there's absolutely no reason to believe that the task that does the mmput() always does it on its own VM. Example: fork, ptrace, /proc - you name it. - calling it *after* having done mmdrop() on it is doubly insane, since the mm struct may well be gone now. - testing mm against NULL before you call it is insane too, since a NULL mm there would have caused oopses long before. .. and those are just the three bugs I found before I decided to give up looking for me and revert it asap. I should have caught it before I even took it, but I trusted Andrew too much. Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07ext4: don't set i_flags in EXT4_IOC_SETFLAGSTao Ma
Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to change the i_flags automatically but fails to remove the error setting of i_flags. So we still have the problem of trashing state flags. Fix this by removing the assignment. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2012-06-07ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bgTheodore Ts'o
Ext3 filesystems that are converted to use as many ext4 file system features as possible will enable uninit_bg to speed up e2fsck times. These file systems will have a native ext3 layout of inode tables and block allocation bitmaps (as opposed to ext4's flex_bg layout). Unfortunately, in these cases, when first allocating a block in an uninitialized block group, ext4 would incorrectly calculate the number of free blocks in that block group, and then errorneously report that the file system was corrupt: EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd This problem can be reproduced via: mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g mount -t ext4 /dev/vdd /mnt fallocate -l 4600m /mnt/test The problem was caused by a bone headed mistake in the check to see if a particular metadata block was part of the block group. Many thanks to Kees Cook for finding and bisecting the buggy commit which introduced this bug (commit fd034a84e1, present since v3.2). Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Kees Cook <keescook@chromium.org> Cc: stable@kernel.org
2012-06-07mm: correctly synchronize rss-counters at exit/execKonstantin Khlebnikov
mm->rss_stat counters have per-task delta: task->rss_stat. Before changing task->mm pointer the kernel must flush this delta with sync_mm_rss(). do_exit() already calls sync_mm_rss() to flush the rss-counters before committing the rss statistics into task->signal->maxrss, taskstats, audit and other stuff. Unfortunately the kernel does this before calling mm_release(), which can call put_user() for processing task->clear_child_tid. So at this point we can trigger page-faults and task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes inconsistent and check_mm() will print something like this: | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1 | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1 This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of do_exit() and calls it earlier. After mm_release() there should be no pagefaults. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> [3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07NFSv4.1: Convert another trivial printk into a dprintkTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07NFS4: Fix open bug when pnfs module blacklistedFred Isaman
Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>