xfs: only run COW extent recovery when there are no live extents

As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. In the first patch, we fixed the race condition in (2) so that (A) will always flush the COW fork. In this second patch, we move the _recover_cow call to the initial mount call in (0) for safety. As mentioned previously, xfs_reflink_recover_cow walks the refcount btree looking for COW staging extents, and frees them. This was intended to be run at mount time (when we know there are no live inodes) to clean up any leftover staging events that may have been left behind during an unclean shutdown. As a time "optimization" for readonly mounts, we deferred this to the ro->rw transition, not realizing that any failure to clean all COW forks during a rw->ro transition would result in catastrophic corruption. Therefore, remove this optimization and only run the recovery routine when we're guaranteed not to have any COW staging extents anywhere, which means we always run this at mount time. While we're at it, move the callsite to xfs_log_mount_finish because any refcount btree expansion (however unlikely given that we're removing records from the right side of the index) must be fed by a per-AG reservation, which doesn't exist in its current location. Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
author: Darrick J. Wong 2021-12-15 11:52:23 -0800
committer: Darrick J. Wong 2021-12-21 09:49:41 -0800
commit: 7993f1a431bc5271369d359941485a9340658ac3 (patch)
tree: 7d473433c6d8df7192a65e5a58a5efb42b2f0166 /fs/xfs/xfs_super.c
parent: 7b7820b83f230036fc48c3e7fb280c48c58adebf (diff)
1 files changed, 0 insertions, 9 deletions
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 778b57b1f020..c7ac486ca5d3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1739,15 +1739,6 @@ xfs_remount_rw(
 	 */
 	xfs_restore_resvblks(mp);
 	xfs_log_work_queue(mp);
-
-	/* Recover any CoW blocks that never got remapped. */
-	error = xfs_reflink_recover_cow(mp);
-	if (error) {
-		xfs_err(mp,
-			"Error %d recovering leftover CoW allocations.", error);
-		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-		return error;
-	}
 	xfs_blockgc_start(mp);
 
 	/* Create the per-AG metadata reservation pool .*/
author	Darrick J. Wong	2021-12-15 11:52:23 -0800
committer	Darrick J. Wong	2021-12-21 09:49:41 -0800
commit	7993f1a431bc5271369d359941485a9340658ac3 (patch)
tree	7d473433c6d8df7192a65e5a58a5efb42b2f0166 /fs/xfs/xfs_super.c
parent	7b7820b83f230036fc48c3e7fb280c48c58adebf (diff)