linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2011-05-24	ceph: fix cap flush race reentrancy	Sage Weil
	In e9964c10 we change cap flushing to do a delicate dance because some inodes on the cap_dirty list could be in a migrating state (got EXPORT but not IMPORT) in which we couldn't actually flush and move from dirty->flushing, breaking the while (!empty) { process first } loop structure. It worked for a single sync thread, but was not reentrant and triggered infinite loops when multiple syncers came along. Instead, move inodes with dirty to a separate cap_dirty_migrating list when in the limbo export-but-no-import state, allowing us to go back to the simple loop structure (which was reentrant). This is cleaner and more robust. Audited the cap_dirty users and this looks fine: list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we have dirty caps (which list we're on is irrelevant) and list_del_init() calls still do the right thing. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24	ceph: avoid inode lookup on nfs fh reconnect	Sage Weil
	If we get the inode from the MDS, we have a reference in req; don't do a fresh lookup. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24	ceph: use LOOKUPINO to make unconnected nfs fh more reliable	Sage Weil
	If we are unable to locate an inode by ino, ask the MDS using the new LOOKUPINO command. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	ceph: check return value for start_request in writepages	Sage Weil
	Since we pass the nofail arg, we should never get an error; BUG if we do. (And fix the function to not return an error if __map_request fails.) Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	ceph: remove useless check	Sage Weil
	rc is only ever 0 or negative in this method. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	ceph: fix broken comparison in readdir loop	Sage Weil
	Both off and fi->offset are unsigned, so the difference is always >= 0. Compare them directly instead of the sign of the difference. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	ceph: fix rare potential cap leak	Sage Weil
	If we grab new_cap, retake the lock, and find we already have a cap now for the given mds, release new_cap. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	ceph: use snprintf for dirstat content	Sage Weil
	We allocate a buffer for rstats if the dirstat option is enabled. Use snprintf. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	libceph: remove unused variable	Sage Weil
	Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19	ceph: take reference on mds request r_unsafe_dir	Sage Weil
	We put ourselves on an inode list for the parent directory of metadata operations so that an fsync on the directory will wait for metadata updates to commit to disk. We weren't holding a reference to that directory, however, and under certain workloads (fsstress in this case) the directory can go away. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-11	ceph: do not use i_wrbuffer_ref as refcount for Fb cap	Henry C Chang
	We increments i_wrbuffer_ref when taking the Fb cap. This breaks the dirty page accounting and causes looping in __ceph_do_pending_vmtruncate, and ceph client hangs. This bug can be reproduced occasionally by running blogbench. Add a new field i_wb_ref to inode and dedicate it to Fb reference counting. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-11	ceph: fix list_add in ceph_put_snap_realm	Henry C Chang
	Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-11	ceph: print debug message before put mds session	Henry C Chang
	The mds session, s, could be freed during ceph_put_mds_session. Move dout before ceph_put_mds_session. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-04	ceph: do not call __mark_dirty_inode under i_lock	Sage Weil
	The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the flags value so that the callers can do it outside of i_lock. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03	ceph: handle ceph_osdc_new_request failure in ceph_writepages_start	Henry C Chang
	We should unlock the page and return -ENOMEM if ceph_osdc_new_request failed. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03	ceph: use ihold() when i_lock is held	Sage Weil
	See 0444d76ae64fffc7851797fc1b6ebdbb44ac504a. Signed-off-by: Sage Weil <sage@newdream.net>
2011-04-07	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6	Linus Torvalds
	* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-03-31	Fix common misspellings	Lucas De Marchi
	Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: Create a new key type "ceph". libceph: Get secret from the kernel keys api when mounting with key=NAME. ceph: Move secret key parsing earlier. libceph: fix null dereference when unregistering linger requests ceph: unlock on error in ceph_osdc_start_request() ceph: fix possible NULL pointer dereference ceph: flush msgr_wq during mds_client shutdown
2011-03-29	ceph: Move secret key parsing earlier.	Tommi Virtanen
	This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-29	fs: don't use igrab() while holding i_lock	Dave Chinner
	Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥ If we are already holding the i_lock, we have a reference to the inode so we can safely use ihold() to gain an extra reference. This avoids hangs due to lock recursion on the i_lock now that the inode_lock is gone and igrab() uses the i_lock itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Ryan Mallon <ryan@bluewatersys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-25	ceph: flush msgr_wq during mds_client shutdown	Sage Weil
	The release method for mds connections uses a backpointer to the mds_client, so we need to flush the workqueue of any pending work (and ceph_connection references) prior to freeing the mds_client. This fixes an oops easily triggered under UML by while true ; do mount ... ; umount ... ; done Also fix an outdated comment: the flush in ceph_destroy_client only flushes OSD connections out. This bug is basically an artifact of the ceph -> ceph+libceph conversion. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21	ceph: rename dentry_release -> d_release, fix comment	Sage Weil
	Just for consistency's sake. Fix obsolete comment too. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21	ceph: add request to the tail of unsafe write list	Henry C Chang
	In sync_write_wait(), we assume that the newest request is at the tail of unsafe write list. We should maintain the semantics here. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21	ceph: remove request from unsafe list if it is canceled/timed out	Henry C Chang
	This fixes the list corruption warning like this: ------------[ cut here ]------------ WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81() Hardware name: X8DTU list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130). Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan] Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1 Call Trace: [<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94 [<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43 [<ffffffff812351a3>] __list_add+0x68/0x81 [<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph] [<ffffffff8111d2a0>] do_sync_write+0xe8/0x125 [<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3 [<ffffffff811e8521>] ? security_file_permission+0x16/0x18 [<ffffffff8111d864>] vfs_write+0xae/0x10b [<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76 [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b ---[ end trace 08573eb9f07ff6f4 ]--- Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21	ceph: move readahead default to fs/ceph from libceph	Sage Weil
	Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21	ceph: add ino32 mount option	Yehuda Sadeh
	The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-03-21	ceph: remove debugfs debug cruft	Sage Weil
	Whoops! Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-15	ceph: preserve I_COMPLETE across rename	Sage Weil
	d_move puts the renamed dentry at the end of d_subdirs, screwing with our cached dentry directory offsets. We were just clearing I_COMPLETE to avoid any possibility of trouble. However, assigning the renamed dentry an offset at the end of the directory (to match it's new d_subdirs position) is sufficient to maintain correct behavior and hold onto I_COMPLETE. This is especially important for workloads like rsync, which renames files into place. Before, we would lose I_COMPLETE and do MDS lookups for each file. With this patch we only talk to the MDS on create and rename. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-10	ceph: fix d_revalidate oopsen on NFS exports	Al Viro
	can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-04	ceph: no .snap inside of snapped namespace	Sage Weil
	Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03	ceph: do not clear I_COMPLETE from d_release	Sage Weil
	First, this was racy anyway: d_release isn't called until well after the dentry is unhashed. Second, this runs afoul of the recent dcache change that clears d_parent prior to calling d_release (949854d0), causing a NULL pointer dereference. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03	ceph: do not set I_COMPLETE	Sage Weil
	Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03	Revert "ceph: keep reference to parent inode on ceph_dentry"	Sage Weil
	This reverts commit 97d79b403ef03f729883246208ef5d8a2ebc4d68. This fails to account for d_parent changes due to rename or disconnected dentries due to submounts or NFS reexports. Signed-off-by: Sage Weil <sage@newdream.net>
2011-02-21	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling
2011-02-19	ceph: keep reference to parent inode on ceph_dentry	Yehuda Sadeh
	When creating a new dentry we now hold a reference to the parent inode in the ceph_dentry. This is required due to the new RCU changes from 949854d0, which set dentry->d_parent to NULL in d_kill before calling the ->release() callback. If/when that behavior is changed, we can revert this hack. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-02-04	ceph: queue cap_snaps once per realm	Sage Weil
	We were forming a dirty list, and then queueing cap_snaps for each realm _and_ its children, regardless of whether the children were already in the dirty list. This meant we did it twice for some realms. Which in turn meant we corrupted mdsc->snap_flush_list when the cap_snap was re-added to the list it was already on, and could trigger an infinite loop. We were also using recursion to do reach all the children, a no-no when stack is limited. Instead, (re)queue any children on the dirty list, avoiding processing anything twice and avoiding any recursion. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-28	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: avoid picking MDS that is not active ceph: avoid immediate cap check after import ceph: fix flushing of caps vs cap import ceph: fix erroneous cap flush to non-auth mds ceph: fix cap_wanted_delay_{min,max} mount option initialization ceph: fix xattr rbtree search ceph: fix getattr on directory when using norbytes
2011-01-25	ceph: avoid picking MDS that is not active	Sage Weil
	Ignore replication or auth frag data if it indicates an MDS that is not active. This can happen if the MDS shuts down and the client has stale data about the namespace distribution across the MDS cluster. If that's the case, fall back to directing the request based on the auth cap (which should always be accurate). Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19	ceph: avoid immediate cap check after import	Sage Weil
	The NODELAY flag avoids the heuristics that delay cap (issued/wanted) release. There's no reason for that after we import a cap, and it kills whatever benefit we get from those delays. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19	ceph: fix flushing of caps vs cap import	Sage Weil
	If we are mid-flush and a cap is migrated to another node, we need to resend the cap flush message to the new MDS, and do so with the original flush_seq to avoid leaking across a sync boundary. Previously we didn't redo the flush (we only flushed newly dirty data), which would cause a later sync to hang forever. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19	ceph: fix erroneous cap flush to non-auth mds	Sage Weil
	The int flushing is global and not clear on each iteration of the loop, which can cause a second flush of caps to any MDSs with ids greater than the auth. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19	ceph: fix cap_wanted_delay_{min,max} mount option initialization	Sage Weil
	These were initialized to 0 instead of the default, fallout from the RBD refactor in 3d14c5d2b6e15c21d8e5467dc62d33127c23a644. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13	ceph: fix xattr rbtree search	Sage Weil
	Fix xattr name comparison in rbtree search for strings that share a prefix. The name argument is null terminated, but the xattr name is not, so we need to use strncmp, but that means adjusting for the case where name is a prefix of xattr->name. The corresponding case in __set_xattr() already handles this properly (although in that case name is also not null terminated). Reported-by: Sergiy Kibrik <sakib@meta.ua> Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13	ceph: fix getattr on directory when using norbytes	Yehuda Sadeh
	The norbytes mount option was broken, and when doing getattr on a directory it return the rbytes instead of the number of entities. This commit fixes it. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode
2011-01-12	ceph: fsc->*_wq's aren't used in memory reclaim path	Tejun Heo
	fsc->*_wq's aren't depended upon during memory reclaim. Convert to alloc_workqueue() w/o WQ_MEM_RECLAIM. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12	ceph: Makefile: Remove unnessary code	Tracey Dent
	Remove the if and else conditional because the code is in mainline and there is no need in it being there. Also, Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12	ceph: associate requests with opening sessions	Sage Weil
	Associate request with sessions that aren't yep open. This makes the debugfs mdsc request list more informative. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12	ceph: drop redundant r_mds field	Sage Weil
	The r_mds field is redundant, since we can find the same information at r_session->s_mds, and when r_session is NULL then r_mds is meaningless. Signed-off-by: Sage Weil <sage@newdream.net>