aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "The big ones here are a memory leak we introduced in rc1, and a scheduling while atomic if the transid on disk doesn't match the transid we expected. This happens for corrupt blocks, or out of date disks. It also fixes up the ioctl definition for our ioctl to resolve logical inode numbers. The __u32 was a merging error and doesn't match what we ship in the progs." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: avoid sleeping in verify_parent_transid while atomic Btrfs: fix crash in scrub repair code when device is missing btrfs: Fix mismatching struct members in ioctl.h Btrfs: fix page leak when allocing extent buffers Btrfs: Add properly locking around add_root_to_dirty_list
2012-05-06Btrfs: avoid sleeping in verify_parent_transid while atomicChris Mason
verify_parent_transid needs to lock the extent range to make sure no IO is underway, and so it can safely clear the uptodate bits if our checks fail. But, a few callers are using it with spinlocks held. Most of the time, the generation numbers are going to match, and we don't want to switch to a blocking lock just for the error case. This adds an atomic flag to verify_parent_transid, and changes it to return EAGAIN if it needs to block to properly verifiy things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04hfsplus: Fix potential buffer overflowsGreg Kroah-Hartman
Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few potential buffer overflows in the hfs filesystem. But as Timo Warns pointed out, these changes also need to be made on the hfsplus filesystem as well. Reported-by: Timo Warns <warns@pre-sense.de> Acked-by: WANG Cong <amwang@redhat.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Sage Weil <sage@newdream.net> Cc: Eugene Teo <eteo@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Anderson <anderson@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-04Merge git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French. * git://git.samba.org/sfrench/cifs-2.6: fs/cifs: fix parsing of dfs referrals cifs: make sure we ignore the credentials= and cred= options [CIFS] Update cifs version to 1.78 cifs - check S_AUTOMOUNT in revalidate cifs: add missing initialization of server->req_lock cifs: don't cap ra_pages at the same level as default_backing_dev_info CIFS: Fix indentation in cifs_show_options
2012-05-04Btrfs: fix crash in scrub repair code when device is missingStefan Behrens
Fix that when scrub tries to repair an I/O or checksum error and one of the devices containing the mirror is missing, it crashes in bio_add_page because the bdev is a NULL pointer for missing devices. Reported-by: Marco L. Crociani <marco.crociani@gmail.com> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04btrfs: Fix mismatching struct members in ioctl.hAlexander Block
Fix the size members of btrfs_ioctl_ino_path_args and btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used __u64 and the kernel headers used __u32 before. Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: fix page leak when allocing extent buffersJosef Bacik
If we happen to alloc a extent buffer and then alloc a page and notice that page is already attached to an extent buffer, we will only unlock it and free our existing eb. Any pages currently attached to that eb will be properly freed, but we don't do the page_cache_release() on the page where we noticed the other extent buffer which can cause us to leak pages and I hope cause the weird issues we've been seeing in this area. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: Add properly locking around add_root_to_dirty_listChris Mason
add_root_to_dirty_list happens once at the very beginning of the transaction, but it is still racey. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-03fs/cifs: fix parsing of dfs referralsStefan Metzmacher
The problem was that the first referral was parsed more than once and so the caller tried the same referrals multiple times. The problem was introduced partly by commit 066ce6899484d9026acd6ba3a8dbbedb33d7ae1b, where 'ref += le16_to_cpu(ref->Size);' got lost, but that was also wrong... Cc: <stable@vger.kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Tested-by: Björn Jacke <bj@sernet.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03vfs: make word-at-a-time accesses handle a non-existing pageLinus Torvalds
It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that can have holes in the kernel address space: it seems to happen easily with Xen, and it looks like the AMD gart64 code will also punch holes dynamically. Actually hitting that case is still very unlikely, so just do the access, and take an exception and fix it up for the very unlikely case of it being a page-crosser with no next page. And hey, this abstraction might even help other architectures that have other issues with unaligned word accesses than the possible missing next page. IOW, this could do the byte order magic too. Peter Anvin fixed a thinko in the shifting for the exception case. Reported-and-tested-by: Jana Saout <jana@saout.de> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-03cifs: make sure we ignore the credentials= and cred= optionsJeff Layton
Older mount.cifs programs passed this on to the kernel after parsing the file. Make sure the kernel ignores that option. Should fix: https://bugzilla.kernel.org/show_bug.cgi?id=43195 Cc: Sachin Prabhu <sprabhu@redhat.com> Reported-by: Ronald <ronald645@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03[CIFS] Update cifs version to 1.78Steve French
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03cifs - check S_AUTOMOUNT in revalidateIan Kent
When revalidating a dentry, if the inode wasn't known to be a dfs entry when the dentry was instantiated, such as when created via ->readdir(), the DCACHE_NEED_AUTOMOUNT flag needs to be set on the dentry in ->d_revalidate(). The false return from cifs_d_revalidate(), due to the inode now being marked with the S_AUTOMOUNT flag, might not invalidate the dentry if there is a concurrent unlazy path walk. This is because the dentry reference count will be at least 2 in this case causing d_invalidate() to return EBUSY. So the asumption that the dentry will be discarded then correctly instantiated via ->lookup() might not hold. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: Steve French <smfrench@gmail.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-02Merge tag 'nfs-for-3.4-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fixes for the NFSv4 security negotiation - Use the correct hostname when mounting from a private namespace - NFS net namespace bugfixes for the pipefs filesystem - NFSv4 GETACL bugfixes - IPv6 bugfix for NFSv4 referrals * tag 'nfs-for-3.4-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: Use the correct hostname in the client identifier string SUNRPC: RPC client must use the current utsname hostname string NFS: get module in idmap PipeFS notifier callback NFS: Remove unused function nfs_lookup_with_sec() NFS: Honor the authflavor set in the clone mount data NFS: Fix following referral mount points with different security NFS: Do secinfo as part of lookup NFS: Handle exceptions coming out of nfs4_proc_fs_locations() NFS: Fix SECINFO_NO_NAME SUNRPC: traverse clients tree on PipeFS event SUNRPC: set per-net PipeFS superblock before notification SUNRPC: skip clients with program without PipeFS entries SUNRPC: skip dead but not buried clients on PipeFS events Avoid beyond bounds copy while caching ACL Avoid reading past buffer when calling GETACL fix page number calculation bug for block layout decode buffer NFSv4.1 fix page number calculation bug for filelayout decode buffers pnfs-obj: Remove unused variable from objlayout_get_deviceinfo() nfs4: fix referrals on mounts that use IPv6 addrs
2012-05-01cifs: add missing initialization of server->req_lockJeff Layton
Cc: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-01cifs: don't cap ra_pages at the same level as default_backing_dev_infoJeff Layton
While testing, I've found that even when we are able to negotiate a much larger rsize with the server, on-the-wire reads often end up being capped at 128k because of ra_pages being capped at that level. Lifting this restriction gave almost a twofold increase in sequential read performance on my craptactular KVM test rig with a 1M rsize. I think this is safe since the actual ra_pages that the VM requests is run through max_sane_readahead() prior to submitting the I/O. Under memory pressure we should end up with large readahead requests being suppressed anyway. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-01CIFS: Fix indentation in cifs_show_optionsSachin Prabhu
Trivial patch which fixes a misplaced tab in cifs_show_options(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-30nfsd: fix nfs4recover.c printk format warningRandy Dunlap
Fix printk format warnings -- both items are size_t, so use %zu to print them. fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t' fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30NFSv4.1: Use the correct hostname in the client identifier stringTrond Myklebust
We need to use the hostname of the process that created the nfs_client. That hostname is now stored in the rpc_client->cl_nodename. Also remove the utsname()->domainname component. There is no reason to include the NIS/YP domainname in a client identifier string. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-29autofs: make the autofsv5 packet file descriptor use a packetized pipeLinus Torvalds
The autofs packet size has had a very unfortunate size problem on x86: because the alignment of 'u64' differs in 32-bit and 64-bit modes, and because the packet data was not 8-byte aligned, the size of the autofsv5 packet structure differed between 32-bit and 64-bit modes despite looking otherwise identical (300 vs 304 bytes respectively). We first fixed that up by making the 64-bit compat mode know about this problem in commit a32744d4abae ("autofs: work around unhappy compat problem on x86-64"), and that made a 32-bit 'systemd' work happily on a 64-bit kernel because everything then worked the same way as on a 32-bit kernel. But it turned out that 'automount' had actually known and worked around this problem in user space, so fixing the kernel to do the proper 32-bit compatibility handling actually *broke* 32-bit automount on a 64-bit kernel, because it knew that the packet sizes were wrong and expected those incorrect sizes. As a result, we ended up reverting that compatibility mode fix, and thus breaking systemd again, in commit fcbf94b9dedd. With both automount and systemd doing a single read() system call, and verifying that they get *exactly* the size they expect but using different sizes, it seemed that fixing one of them inevitably seemed to break the other. At one point, a patch I seriously considered applying from Michael Tokarev did a "strcmp()" to see if it was automount that was doing the operation. Ugly, ugly. However, a prettier solution exists now thanks to the packetized pipe mode. By marking the communication pipe as being packetized (by simply setting the O_DIRECT flag), we can always just write the bigger packet size, and if user-space does a smaller read, it will just get that partial end result and the extra alignment padding will simply be thrown away. This makes both automount and systemd happy, since they now get the size they asked for, and the kernel side of autofs simply no longer needs to care - it could pad out the packet arbitrarily. Of course, if there is some *other* user of autofs (please, please, please tell me it ain't so - and we haven't heard of any) that tries to read the packets with multiple writes, that other user will now be broken - the whole point of the packetized mode is that one system call gets exactly one packet, and you cannot read a packet in pieces. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-29pipes: add a "packetized pipe" mode for writingLinus Torvalds
The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org # needed for systemd/autofs interaction fix Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-28NFS: get module in idmap PipeFS notifier callbackStanislav Kinsbursky
This is bug fix. Notifier callback is called from SUNRPC module. So before dereferencing NFS module we have to make sure, that it's alive. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has our collection of bug fixes. I missed the last rc because I thought our patches were making NFS crash during my xfs test runs. Turns out it was an NFS client bug fixed by someone else while I tried to bisect it. All of these fixes are small, but some are fairly high impact. The biggest are fixes for our mount -o remount handling, a deadlock due to GFP_KERNEL allocations in readdir, and a RAID10 error handling bug. This was tested against both 3.3 and Linus' master as of this morning." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (26 commits) Btrfs: reduce lock contention during extent insertion Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir Btrfs: Fix space checking during fs resize Btrfs: fix block_rsv and space_info lock ordering Btrfs: Prevent root_list corruption Btrfs: fix repair code for RAID10 Btrfs: do not start delalloc inodes during sync Btrfs: fix that check_int_data mount option was ignored Btrfs: don't count CRC or header errors twice while scrubbing Btrfs: fix btrfs_ioctl_dev_info() crash on missing device btrfs: don't return EINTR Btrfs: double unlock bug in error handling Btrfs: always store the mirror we read the eb from fs/btrfs/volumes.c: add missing free_fs_devices btrfs: fix early abort in 'remount' Btrfs: fix max chunk size check in chunk allocator Btrfs: add missing read locks in backref.c Btrfs: don't call free_extent_buffer twice in iterate_irefs Btrfs: Make free_ipath() deal gracefully with NULL pointers Btrfs: avoid possible use-after-free in clear_extent_bit() ...
2012-04-28Revert "autofs: work around unhappy compat problem on x86-64"Linus Torvalds
This reverts commit a32744d4abae24572eff7269bc17895c41bd0085. While that commit was technically the right thing to do, and made the x86-64 compat mode work identically to native 32-bit mode (and thus fixing the problem with a 32-bit systemd install on a 64-bit kernel), it turns out that the automount binaries had workarounds for this compat problem. Now, the workarounds are disgusting: doing an "uname()" to find out the architecture of the kernel, and then comparing it for the 64-bit cases and fixing up the size of the read() in automount for those. And they were confused: it's not actually a generic 64-bit issue at all, it's very much tied to just x86-64, which has different alignment for an 'u64' in 64-bit mode than in 32-bit mode. But the end result is that fixing the compat layer actually breaks the case of a 32-bit automount on a x86-64 kernel. There are various approaches to fix this (including just doing a "strcmp()" on current->comm and comparing it to "automount"), but I think that I will do the one that teaches pipes about a special "packet mode", which will allow user space to not have to care too deeply about the padding at the end of the autofs packet. That change will make the compat workaround unnecessary, so let's revert it first, and get automount working again in compat mode. The packetized pipes will then fix autofs for systemd. Reported-and-requested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Ian Kent <raven@themaw.net> Cc: stable@kernel.org # for 3.3 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-27Merge git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French. * git://git.samba.org/sfrench/cifs-2.6: Use correct conversion specifiers in cifs_show_options CIFS: Show backupuid/gid in /proc/mounts cifs: fix offset handling in cifs_iovec_write
2012-04-27Btrfs: reduce lock contention during extent insertionChris Mason
We're spending huge amounts of time on lock contention during end_io processing because we unconditionally assume we are overwriting an existing extent in the file for each IO. This checks to see if we are outside i_size, and if so, it uses a less expensive readonly search of the btree to look for existing extents. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdirChris Mason
Btrfs has an optimization where it will preallocate dentries during readdir to fill in enough information to open the inode without an extra lookup. But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and that leads to deadlocks because our readdir code has tree locks held. For now, disable this optimization. We'll fix the gfp mask in the next merge window. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27NFS: Remove unused function nfs_lookup_with_sec()Bryan Schumaker
This fixes a compiler warning. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Honor the authflavor set in the clone mount dataBryan Schumaker
The authflavor is set in an nfs_clone_mount structure and passed to the xdev_mount() functions where it was promptly ignored. Instead, use it to initialize an rpc_clnt for the cloned server. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Fix following referral mount points with different securityBryan Schumaker
I create a new proc_lookup_mountpoint() to use when submounting an NFS v4 share. This function returns an rpc_clnt to use for performing an fs_locations() call on a referral's mountpoint. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Do secinfo as part of lookupBryan Schumaker
Whenever lookup sees wrongsec do a secinfo and retry the lookup to find attributes of the file or directory, such as "is this a referral mountpoint?". This also allows me to remove handling -NFS4ERR_WRONSEC as part of getattr xdr decoding. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Handle exceptions coming out of nfs4_proc_fs_locations()Bryan Schumaker
We don't want to return -NFS4ERR_WRONGSEC to the VFS because it could cause the kernel to oops. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Fix SECINFO_NO_NAMEBryan Schumaker
I was using the same decoder function for SECINFO and SECINFO_NO_NAME, so it was returning an error when it tried to decode an OP_SECINFO_NO_NAME header as OP_SECINFO. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27Avoid beyond bounds copy while caching ACLSachin Prabhu
When attempting to cache ACLs returned from the server, if the bitmap size + the ACL size is greater than a PAGE_SIZE but the ACL size itself is smaller than a PAGE_SIZE, we can read past the buffer page boundary. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27Btrfs: Fix space checking during fs resizeDaniel J Blueman
Fix out-of-space checking, addressing a warning and potential resource leak when resizing the filesystem down while allocating blocks. Signed-off-by: Daniel J Blueman <daniel@quora.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix block_rsv and space_info lock orderingStefan Behrens
may_commit_transaction() calls spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); and update_global_block_rsv() calls spin_lock(&block_rsv->lock); spin_lock(&sinfo->lock); Lockdep complains about this at run time. Everywhere except in update_global_block_rsv(), the space_info lock is the outer lock, therefore the locking order in update_global_block_rsv() is changed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: Prevent root_list corruptionDaniel J Blueman
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add correct locking to address this. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix repair code for RAID10Jan Schmidt
btrfs_map_block sets mirror_num, so that the repair code knows eventually which device gave us the read error. For RAID10, mirror_num must be 1 or 2. Before this fix mirror_num was incorrectly related to our stripe index. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: do not start delalloc inodes during syncJosef Bacik
btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and start writing them out, but it doesn't splice the list or anything so as long as somebody is doing work on the box you could end up in this section _forever_. So just remove it, it's not needed anyway since sync will start writeback on all inodes anyway, all we need to do is wait for ordered extents and then we can commit the transaction. In my horrible torture test sync goes from taking 4 minutes to about 1.5 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Avoid reading past buffer when calling GETACLSachin Prabhu
Bug noticed in commit bf118a342f10dafe44b14451a1392c3254629a1f When calling GETACL, if the size of the bitmap array, the length attribute and the acl returned by the server is greater than the allocated buffer(args.acl_len), we can Oops with a General Protection fault at _copy_from_pages() when we attempt to read past the pages allocated. This patch allocates an extra PAGE for the bitmap and checks to see that the bitmap + attribute_length + ACLs don't exceed the buffer space allocated to it. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reported-by: Jian Li <jiali@redhat.com> [Trond: Fixed a size_t vs unsigned int printk() warning] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge fixes from Andrew Morton: "13 fixes. The acerhdf patches aren't (really) fixes. But they've been stuck in my tree for up to two years, sent to Matthew multiple times and the developers are unhappy." * emailed from Andrew Morton <akpm@linux-foundation.org>: (13 patches) mm: fix NULL ptr dereference in move_pages mm: fix NULL ptr dereference in migrate_pages revert "proc: clear_refs: do not clear reserved pages" drivers/rtc/rtc-ds1307.c: fix BUG shown with lock debugging enabled arch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file hugetlbfs: lockdep annotate root inode properly acerhdf: lowered default temp fanon/fanoff values acerhdf: add support for new hardware acerhdf: add support for Aspire 1410 BIOS v1.3314 fs/buffer.c: remove BUG() in possible but rare condition mm: fix up the vmscan stat in vmstat epoll: clear the tfile_check_list on -ELOOP mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma
2012-04-26fix page number calculation bug for block layout decode bufferJim Rees
Signed-off-by: Jim Rees <rees@umich.edu> Suggested-by: Andy Adamson <andros@netapp.com> Suggested-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26NFSv4.1 fix page number calculation bug for filelayout decode buffersAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()Sachin Bhamare
Local variable 'sb' was not being used in objlayout_get_deviceinfo(). Signed-off-by: Sachin Bhamare <sbhamare@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26nfs4: fix referrals on mounts that use IPv6 addrsWeston Andros Adamson
All referrals (IPv4 addr, IPv6 addr, and DNS) are broken on mounts of IPv6 addresses, because validation code uses a path that is parsed from the dev_name ("<server>:<path>") by splitting on the first colon and colons are used in IPv6 addrs. This patch ignores colons within IPv6 addresses that are escaped by '[' and ']'. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-25Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix NFSv4 infinite loops on open(O_TRUNC) - Fix an Oops and an infinite loop in the NFSv4 flock code - Don't register the PipeFS filesystem until it has been set up - Fix an Oops in nfs_try_to_update_request - Don't reuse NFSv4 open owners: fixes a bad sequence id storm. * tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Keep dropped state owners on the LRU list for a while NFSv4: Ensure that we don't drop a state owner more than once NFSv4: Ensure we do not reuse open owner names nfs: Enclose hostname in brackets when needed in nfs_do_root_mount NFS: put open context on error in nfs_flush_multi NFS: put open context on error in nfs_pagein_multi NFSv4: Fix open(O_TRUNC) and ftruncate() error handling NFSv4: Ensure that we check lock exclusive/shared type against open modes NFSv4: Ensure that the LOCK code sets exception->inode NFS: check for req==NULL in nfs_try_to_update_request cleanup SUNRPC: register PipeFS file system after pernet sybsystem
2012-04-25revert "proc: clear_refs: do not clear reserved pages"Will Deacon
Revert commit 85e72aa5384 ("proc: clear_refs: do not clear reserved pages"), which was a quick fix suitable for -stable until ARM had been moved over to the gate_vma mechanism: https://lkml.org/lkml/2012/1/14/55 With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user mapping"), ARM does now use the gate_vma, so the PageReserved check can be removed from the proc code. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25hugetlbfs: lockdep annotate root inode properlyAneesh Kumar K.V
This fixes the below reported false lockdep warning. e096d0c7e2e4 ("lockdep: Add helper function for dir vs file i_mutex annotation") added a similar annotation for every other inode in hugetlbfs but missed the root inode because it was allocated by a separate function. For HugeTLB fs we allow taking i_mutex in mmap. HugeTLB fs doesn't support file write and its file read callback is modified in a05b0855fd ("hugetlbfs: avoid taking i_mutex from hugetlbfs_read()") to not take i_mutex. Hence for HugeTLB fs with regular files we really don't take i_mutex with mmap_sem held. ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc1+ #322 Not tainted ------------------------------------------------------- bash/1572 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff810f1618>] might_fault+0x40/0x90 but task is already holding lock: (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#12){+.+.+.}: [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa [<ffffffff816a2f5e>] __mutex_lock_common+0x48/0x350 [<ffffffff816a3325>] mutex_lock_nested+0x2a/0x31 [<ffffffff811fb8e1>] hugetlbfs_file_mmap+0x7d/0x104 [<ffffffff810f859a>] mmap_region+0x272/0x47d [<ffffffff810f8a39>] do_mmap_pgoff+0x294/0x2ee [<ffffffff810f8b65>] sys_mmap_pgoff+0xd2/0x10e [<ffffffff8103d19e>] sys_mmap+0x1d/0x1f [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75 [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa [<ffffffff810f1645>] might_fault+0x6d/0x90 [<ffffffff81125d62>] filldir+0x6a/0xc2 [<ffffffff81133a83>] dcache_readdir+0x5c/0x222 [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8 [<ffffffff811260b6>] sys_getdents+0x79/0xc9 [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#12); lock(&mm->mmap_sem); lock(&sb->s_type->i_mutex_key#12); lock(&mm->mmap_sem); *** DEADLOCK *** 1 lock held by bash/1572: #0: (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8 stack backtrace: Pid: 1572, comm: bash Not tainted 3.4.0-rc1+ #322 Call Trace: [<ffffffff81699a3c>] print_circular_bug+0x1f8/0x209 [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75 [<ffffffff810f38aa>] ? handle_pte_fault+0x5ff/0x614 [<ffffffff8109e622>] ? mark_lock+0x2d/0x258 [<ffffffff810f1618>] ? might_fault+0x40/0x90 [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa [<ffffffff810f1618>] ? might_fault+0x40/0x90 [<ffffffff816a3249>] ? __mutex_lock_common+0x333/0x350 [<ffffffff810f1645>] might_fault+0x6d/0x90 [<ffffffff810f1618>] ? might_fault+0x40/0x90 [<ffffffff81125d62>] filldir+0x6a/0xc2 [<ffffffff81133a83>] dcache_readdir+0x5c/0x222 [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74 [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74 [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74 [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8 [<ffffffff811260b6>] sys_getdents+0x79/0xc9 [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25fs/buffer.c: remove BUG() in possible but rare conditionGlauber Costa
While stressing the kernel with with failing allocations today, I hit the following chain of events: alloc_page_buffers(): bh = alloc_buffer_head(GFP_NOFS); if (!bh) goto no_grow; <= path taken grow_dev_page(): bh = alloc_page_buffers(page, size, 0); if (!bh) goto failed; <= taken, consequence of the above and then the failed path BUG()s the kernel. The failure is inserted a litte bit artificially, but even then, I see no reason why it should be deemed impossible in a real box. Even though this is not a condition that we expect to see around every time, failed allocations are expected to be handled, and BUG() sounds just too much. As a matter of fact, grow_dev_page() can return NULL just fine in other circumstances, so I propose we just remove it, then. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25epoll: clear the tfile_check_list on -ELOOPJason Baron
An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent circular epoll dependencies from being created. However, in that case we do not properly clear the 'tfile_check_list'. Thus, add a call to clear_tfile_check_list() for the -ELOOP case. Signed-off-by: Jason Baron <jbaron@redhat.com> Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Davide Libenzi <davidel@xmailserver.org> Tested-by: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>