aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2014-10-30ext4: make ext4_ext_convert_to_initialized() return proper number of blocksJan Kara
ext4_ext_convert_to_initialized() can return more blocks than are actually allocated from map->m_lblk in case where initial part of the on-disk extent is zeroed out. Luckily this doesn't have serious consequences because the caller currently uses the return value only to unmap metadata buffers. Anyway this is a data corruption/exposure problem waiting to happen so fix it. Coverity-id: 1226848 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-30ext4: bail early when clearing inode journal flag failsJan Kara
When clearing inode journal flag, we call jbd2_journal_flush() to force all the journalled data to their final locations. Currently we ignore when this fails and continue clearing inode journal flag. This isn't a big problem because when jbd2_journal_flush() fails, journal is likely aborted anyway. But it can still lead to somewhat confusing results so rather bail out early. Coverity-id: 989044 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-30ext4: bail out from make_indexed_dir() on first errorJan Kara
When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node() fail, there's really something wrong with the fs and there's no point in continuing further. Just return error from make_indexed_dir() in that case. Also initialize frames array so that if we return early due to error, dx_release() doesn't try to dereference uninitialized memory (which could happen also due to error in do_split()). Coverity-id: 741300 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-30ext4: prevent bugon on race between write/fcntlDmitry Monakhov
O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked twice inside ext4_file_write_iter() and __generic_file_write() which result in BUG_ON inside ext4_direct_IO. Let's initialize iocb->private unconditionally. TESTCASE: xfstest:generic/036 https://patchwork.ozlabs.org/patch/402445/ #TYPICAL STACK TRACE: kernel BUG at fs/ext4/inode.c:2960! invalid opcode: 0000 [#1] SMP Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000 RIP: 0010:[<ffffffff811fabf2>] [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP: 0018:ffff88080f90bb58 EFLAGS: 00010246 RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818 RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001 RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80 R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28 FS: 00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0 Stack: ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30 0000000000000200 0000000000000200 0000000000000001 0000000000000200 ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200 Call Trace: [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180 [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370 [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700 [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0 [<ffffffff811bd756>] aio_run_iocb+0x286/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0 [<ffffffff811bde3b>] do_io_submit+0x55b/0x740 [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740 [<ffffffff811be030>] SyS_io_submit+0x10/0x20 [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c 24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89 RIP [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP <ffff88080f90bb58> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
2014-10-30ext4: remove extent status procfs files if journal load failsDarrick J. Wong
If we can't load the journal, remove the procfs files for the extent status information file to avoid leaking resources. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-30ext4: disallow changing journal_csum option during remountDarrick J. Wong
ext4 does not permit changing the metadata or journal checksum feature flag while mounted. Until we decide to support that, don't allow a remount to change the journal_csum flag (right now we silently fail to change anything). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-30ext4: enable journal checksum when metadata checksum feature enabledDarrick J. Wong
If metadata checksumming is turned on for the FS, we need to tell the journal to use checksumming too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-30ext4: fix oops when loading block bitmap failedJan Kara
When we fail to load block bitmap in __ext4_new_inode() we will dereference NULL pointer in ext4_journal_get_write_access(). So check for error from ext4_read_block_bitmap(). Coverity-id: 989065 Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-30ext4: fix overflow when updating superblock backups after resizeJan Kara
When there are no meta block groups update_backups() will compute the backup block in 32-bit arithmetics thus possibly overflowing the block number and corrupting the filesystem. OTOH filesystems without meta block groups larger than 16 TB should be rare. Fix the problem by doing the counting in 64-bit arithmetics. Coverity-id: 741252 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2014-10-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "overlayfs merge + leak fix for d_splice_alias() failure exits" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: overlayfs: embed middle into overlay_readdir_data overlayfs: embed root into overlay_readdir_data overlayfs: make ovl_cache_entry->name an array instead of pointer overlayfs: don't hold ->i_mutex over opening the real directory fix inode leaks on d_splice_alias() failure exits fs: limit filesystem stacking depth overlay: overlay filesystem documentation overlayfs: implement show_options overlayfs: add statfs support overlay filesystem shmem: support RENAME_WHITEOUT ext4: support RENAME_WHITEOUT vfs: add RENAME_WHITEOUT vfs: add whiteout support vfs: export check_sticky() vfs: introduce clone_private_mount() vfs: export __inode_permission() to modules vfs: export do_splice_direct() to modules vfs: add i_op->dentry_open()
2014-10-24ext4: support RENAME_WHITEOUTMiklos Szeredi
Add whiteout support to ext4_rename(). A whiteout inode (chrdev/0,0) is created before the rename takes place. The whiteout inode is added to the old entry instead of deleting it. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-20Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A large number of cleanups and bug fixes, with some (minor) journal optimizations" [ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ] * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits) ext4: check s_chksum_driver when looking for bg csum presence ext4: move error report out of atomic context in ext4_init_block_bitmap() ext4: Replace open coded mdata csum feature to helper function ext4: delete useless comments about ext4_move_extents ext4: fix reservation overflow in ext4_da_write_begin ext4: add ext4_iget_normal() which is to be used for dir tree lookups ext4: don't orphan or truncate the boot loader inode ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT ext4: optimize block allocation on grow indepth ext4: get rid of code duplication ext4: fix over-defensive complaint after journal abort ext4: fix return value of ext4_do_update_inode ext4: fix mmap data corruption when blocksize < pagesize vfs: fix data corruption when blocksize < pagesize for mmaped data ext4: fold ext4_nojournal_sops into ext4_sops ext4: support freezing ext2 (nojournal) file systems ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs() ext4: don't check quota format when there are no quota files jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list jbd2: avoid pointless scanning of checkpoint lists ...
2014-10-15Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
2014-10-14ext4: check s_chksum_driver when looking for bg csum presenceDarrick J. Wong
Convert the ext4_has_group_desc_csum predicate to look for a checksum driver instead of the metadata_csum flag and change the bg checksum calculation function to look for GDT_CSUM before taking the crc16 path. Without this patch, if we mount with ^uninit_bg,^metadata_csum and later metadata_csum gets turned on by accident, the block group checksum functions will incorrectly assume that checksumming is enabled (metadata_csum) but that crc16 should be used (!s_chksum_driver). This is totally wrong, so fix the predicate and the checksum formula selection. (Granted, if the metadata_csum feature bit gets enabled on a live FS then something underhanded is going on, but we could at least avoid writing garbage into the on-disk fields.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
2014-10-13ext4: move error report out of atomic context in ext4_init_block_bitmap()Dmitry Monakhov
Error report likely result in IO so it is bad idea to do it from atomic context. This patch should fix following issue: BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349 in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1 5 locks held by kworker/u128:1/137: #0: ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0 #1: ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0 #2: (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0 #3: (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430 #4: (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630 CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 Workqueue: writeback bdi_writeback_workfn (flush-1:0) 0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288 ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38 ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000 Call Trace: [<ffffffff815c7fdc>] dump_stack+0x51/0x6d [<ffffffff8108fb30>] __might_sleep+0xf0/0x100 [<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0 [<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20 [<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230 [<ffffffff8120fa03>] save_error_info+0x23/0x30 [<ffffffff8120fd06>] __ext4_error+0xb6/0xd0 [<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190 [<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630 [<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0 [<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60 [<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80 [<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280 [<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190 [<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410 [<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380 [<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0 [<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180 [<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0 [<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0 [<ffffffff81221797>] ? ext4_find_extent+0x97/0x320 [<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050 [<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430 [<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430 [<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50 [<ffffffff811feb44>] ext4_writepages+0x564/0xaf0 [<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490 [<ffffffff811377e3>] do_writepages+0x23/0x40 [<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280 [<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490 [<ffffffff811a0664>] wb_writeback+0x174/0x2d0 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811a0863>] wb_do_writeback+0xa3/0x200 [<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0 [<ffffffff810856cd>] process_one_work+0x2dd/0x4d0 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0 [<ffffffff81085c1d>] worker_thread+0x35d/0x460 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0 [<ffffffff8108a885>] kthread+0xf5/0x100 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70 [<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0 [<ffffffff8108a790>] ? __init_kthread_work Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-13ext4: Replace open coded mdata csum feature to helper functionDmitry Monakhov
Besides the fact that this replacement improves code readability it also protects from errors caused direct EXT4_S(sb)->s_es manipulation which may result attempt to use uninitialized csum machinery. #Testcase_BEGIN IMG=/dev/ram0 MNT=/mnt mkfs.ext4 $IMG mount $IMG $MNT #Enable feature directly on disk, on mounted fs tune2fs -O metadata_csum $IMG # Provoke metadata update, likey result in OOPS touch $MNT/test umount $MNT #Testcase_END # Replacement script @@ expression E; @@ - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) + ext4_has_metadata_csum(E) https://bugzilla.kernel.org/show_bug.cgi?id=82201 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-11ext4: delete useless comments about ext4_move_extentsXiaoguang Wang
In patch 'ext4: refactor ext4_move_extents code base', Dmitry Monakhov has refactored ext4_move_extents' implementation, but forgot to update the corresponding comments, this patch will try to delete some useless comments. Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-11ext4: fix reservation overflow in ext4_da_write_beginEric Sandeen
Delalloc write journal reservations only reserve 1 credit, to update the inode if necessary. However, it may happen once in a filesystem's lifetime that a file will cross the 2G threshold, and require the LARGE_FILE feature to be set in the superblock as well, if it was not set already. This overruns the transaction reservation, and can be demonstrated simply on any ext4 filesystem without the LARGE_FILE feature already set: dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \ conv=notrunc of=testfile sync dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \ conv=notrunc of=testfile leads to: EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28 EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28 EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28 Adjust the number of credits based on whether the flag is already set, and whether the current write may extend past the LARGE_FILE limit. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
2014-10-05ext4: add ext4_iget_normal() which is to be used for dir tree lookupsTheodore Ts'o
If there is a corrupted file system which has directory entries that point at reserved, metadata inodes, prohibit them from being used by treating them the same way we treat Boot Loader inodes --- that is, mark them to be bad inodes. This prohibits them from being opened, deleted, or modified via chmod, chown, utimes, etc. In particular, this prevents a corrupted file system which has a directory entry which points at the journal inode from being deleted and its blocks released, after which point Much Hilarity Ensues. Reported-by: Sami Liedes <sami.liedes@iki.fi> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-05ext4: don't orphan or truncate the boot loader inodeTheodore Ts'o
The boot loader inode (inode #5) should never be visible in the directory hierarchy, but it's possible if the file system is corrupted that there will be a directory entry that points at inode #5. In order to avoid accidentally trashing it, when such a directory inode is opened, the inode will be marked as a bad inode, so that it's not possible to modify (or read) the inode from userspace. Unfortunately, when we unlink this (invalid/illegal) directory entry, we will put the bad inode on the ophan list, and then when try to unlink the directory, we don't actually remove the bad inode from the orphan list before freeing in-memory inode structure. This means the in-memory orphan list is corrupted, leading to a kernel oops. In addition, avoid truncating a bad inode in ext4_destroy_inode(), since truncating the boot loader inode is not a smart thing to do. Reported-by: Sami Liedes <sami.liedes@iki.fi> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-03ext4: grab missed write_count for EXT4_IOC_SWAP_BOOTDmitry Monakhov
Otherwise this provokes complain like follows: WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0() Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030 0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8 ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000 Call Trace: [<ffffffff815c7dfc>] dump_stack+0x51/0x6d [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff81202008>] ? ext4_ioctl+0x9e8/0xeb0 [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20 [<ffffffff8122867e>] ext4_journal_check_start+0x4e/0xa0 [<ffffffff81228c10>] __ext4_journal_start_sb+0x90/0x110 [<ffffffff81202008>] ext4_ioctl+0x9e8/0xeb0 [<ffffffff8107b0bd>] ? ptrace_stop+0x24d/0x2f0 [<ffffffff81088530>] ? alloc_pid+0x480/0x480 [<ffffffff8107b1f2>] ? ptrace_do_notify+0x92/0xb0 [<ffffffff81186545>] do_vfs_ioctl+0x4e5/0x550 [<ffffffff815cdbcb>] ? _raw_spin_unlock_irq+0x2b/0x40 [<ffffffff81186603>] SyS_ioctl+0x53/0x80 [<ffffffff815ce2ce>] tracesys+0xd0/0xd5 Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-10-01ext4: optimize block allocation on grow indepthDmitry Monakhov
It is reasonable to prepend newly created index to older one. [ Dropped no longer used function parameter newext. -tytso ] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-01ext4: get rid of code duplicationDmitry Monakhov
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-01ext4: fix over-defensive complaint after journal abortDmitry Monakhov
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-01ext4: fix return value of ext4_do_update_inodeLi Xi
When ext4_do_update_inode() gets error from ext4_inode_blocks_set(), error number should be returned. Signed-off-by: Li Xi <lixi@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2014-10-01ext4: fix mmap data corruption when blocksize < pagesizeJan Kara
Use truncate_isize_extended() when hole is being created in a file so that ->page_mkwrite() will get called for the partial tail page if it is mmaped (see the first patch in the series for details). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-09-24Merge branch 'for-linus' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>
2014-09-18ext4: fold ext4_nojournal_sops into ext4_sopsTheodore Ts'o
There's no longer any need to have a separate set of super_operations for nojournal mode. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-18ext4: support freezing ext2 (nojournal) file systemsTheodore Ts'o
Through an oversight, when we added nojournal support to ext4, we didn't add support to allow file system freezing. This is relatively easy to add, so let's do it. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Dexuan Cui <decui@microsoft.com>
2014-09-18ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()Theodore Ts'o
This allows us to eliminate duplicate code, and eventually allow us to also fold ext4_sops and ext4_nojournal_sops together. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-18ext4: don't check quota format when there are no quota filesJan Kara
The check whether quota format is set even though there are no quota files with journalled quota is pointless and it actually makes it impossible to turn off journalled quotas (as there's no way to unset journalled quota format). Just remove the check. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-16ext4: explicitly inform user about orphan list cleanupDmitry Monakhov
Production fs likely compiled/mounted w/o jbd debugging, so orphan list clearing will be silent. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-16ext4: check EA value offset when loadingDarrick J. Wong
When loading extended attributes, check each entry's value offset to make sure it doesn't collide with the entries. Without this check it is easy to crash the kernel by mounting a malicious FS containing a file with an EA wherein e_value_offs = 0 and e_value_size > 0 and then deleting the EA, which corrupts the name list. (See the f_ea_value_crash test's FS image in e2fsprogs for an example.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-09-11ext4: don't keep using page if inline conversion failsDarrick J. Wong
If inline->extent conversion fails (most probably due to ENOSPC) and we release the temporary page that we allocated to transfer the file contents, don't keep using the page pointer after releasing the page. This occasionally leads to complaints about evicting locked pages or hangs when blocksize > pagesize, because it's possible for the page to get reallocated elsewhere in the meantime. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tao Ma <tm@tao.ma>
2014-09-11ext4: validate external journal superblock checksumDarrick J. Wong
If the external journal device has metadata_csum enabled, verify that the superblock checksum matches the block before we try to mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-11jbd2: fix journal checksum feature flag handlingDarrick J. Wong
Clear all three journal checksum feature flags before turning on whichever journal checksum options we want. Rearrange the error checking so that newer flags get complained about first. Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-11ext4: provide separate operations for sysfs feature filesLukas Czerner
Currently sysfs feature files uses ext4_attr_ops as the file operations to show/store data. However the feature files is not supposed to contain any data at all, the sole existence of the file means that the module support the feature. Moreover, none of the sysfs feature attributes actually register show/store functions so that would not be a problem. However if a sysfs feature attribute register a show or store function we might be in trouble because the kobject in this case is _not_ embedded in the ext4_sb_info structure as ext4_attr_show/store expect. So just to be safe, provide separate empty sysfs_ops to use in ext4_feat_ktype. This might safe us from potential problems in the future. As a bonus we can "store" something more descriptive than nothing in the files, so let it contain "enabled" to make it clear that the feature is really present in the module. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-11ext4: add sysfs entry showing whether the fs contains errorsLukas Czerner
Currently there is no easy way to tell that the mounted file system contains errors other than checking for log messages, or reading the information directly from superblock. This patch adds new sysfs entries: errors_count (number of fs errors we encounter) first_error_time (unix timestamp for the first error we see) last_error_time (unix timestamp for the last error we see) If the file system is not marked as containing errors then any of the file will return 0. Otherwise it will contain valid information. More details about the errors should as always be found in the logs. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-11ext4: don't use MAXQUOTAS valueJan Kara
MAXQUOTAS value defines maximum number of quota types VFS supports. This isn't necessarily the number of types ext4 supports. Although ext4 will support project quotas, use ext4 private definition for consistency with other filesystems. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-08percpu_counter: add @gfp to percpu_counter_init()Tejun Heo
Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>
2014-09-04ext4: use non-movable memory for the ext4 superblockGioh Kim
Since the ext4 superblock is not released until the file system is unmounted, allocate the buffer cache entry for the ext4 superblock out of the non-moveable are to allow page migrations and thus CMA allocations to more easily succeed if the CMA area is limited. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-09-04ext4: renumber EXT4_EX_* flags to avoid flag aliasing problemsTheodore Ts'o
Suggested-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-04ext4: drop the EXT4_STATE_DELALLOC_RESERVED flagTheodore Ts'o
Having done a full regression test, we can now drop the DELALLOC_RESERVED state flag. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-09-04ext4: prepare to drop EXT4_STATE_DELALLOC_RESERVEDTheodore Ts'o
The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented because it was too hard to make sure the mballoc and get_block flags could be reliably passed down through all of the codepaths that end up calling ext4_mb_new_blocks(). Since then, we have mb_flags passed down through most of the code paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky as it used to. This commit plumbs in the last of what is required, and then adds a WARN_ON check to make sure we haven't missed anything. If this passes a full regression test run, we can then drop EXT4_STATE_DELALLOC_RESERVED. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-09-04ext4: pass allocation_request struct to ext4_(alloc,splice)_branchTheodore Ts'o
Instead of initializing the allocation_request structure in ext4_alloc_branch(), set it up in ext4_ind_map_blocks(), and then pass it to ext4_alloc_branch() and ext4_splice_branch(). This allows ext4_ind_map_blocks to pass flags in the allocation request structure without having to add Yet Another argument to ext4_alloc_branch(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-09-03ext4: avoid trying to kfree an ERR_PTR pointerTheodore Ts'o
Thanks to Dan Carpenter for extending smatch to find bugs like this. (This was found using a development version of smatch.) Fixes: 36de928641ee48b2078d3fe9514242aaa2f92013 Reported-by: Dan Carpenter <dan.carpenter@oracle.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-09-01ext4: track extent status tree shrinker delay staticticsZheng Liu
This commit adds some statictics in extent status tree shrinker. The purpose to add these is that we want to collect more details when we encounter a stall caused by extent status tree shrinker. Here we count the following statictics: stats: the number of all objects on all extent status trees the number of reclaimable objects on lru list cache hits/misses the last sorted interval the number of inodes on lru list average: scan time for shrinking some objects the number of shrunk objects maximum: the inode that has max nr. of objects on lru list the maximum scan time for shrinking some objects The output looks like below: $ cat /proc/fs/ext4/sda1/es_shrinker_info stats: 28228 objects 6341 reclaimable objects 5281/631 cache hits/misses 586 ms last sorted interval 250 inodes on lru list average: 153 us scan time 128 shrunk objects maximum: 255 inode (255 objects, 198 reclaimable) 125723 us max scan time If the lru list has never been sorted, the following line will not be printed: 586ms last sorted interval If there is an empty lru list, the following lines also will not be printed: 250 inodes on lru list ... maximum: 255 inode (255 objects, 198 reclaimable) 0 us max scan time Meanwhile in this commit a new trace point is defined to print some details in __ext4_es_shrink(). Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: improve extents status tree trace pointZheng Liu
This commit improves the trace point of extents status tree. We rename trace_ext4_es_shrink_enter in ext4_es_count() because it is also used in ext4_es_scan() and we can not identify them from the result. Further this commit fixes a variable name in trace point in order to keep consistency with others. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: fix comments about get_blocksSeunghun Lee
get_blocks is renamed to get_block. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: enable block_validity by defaultDarrick J. Wong
Enable by default the block_validity feature, which checks for collisions between newly allocated blocks and critical system metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>