aboutsummaryrefslogtreecommitdiff
path: root/fs/gfs2/rgrp.c
AgeCommit message (Collapse)Author
2020-06-05gfs2: Turn gl_delete into a delayed workAndreas Gruenbacher
This requires flushing delayed work items in gfs2_make_fs_ro (which is called before unmounting a filesystem). When inodes are deleted and then recreated, pending gl_delete work items would have no effect because the inode generations will have changed, so we can cancel any pending gl_delete works before reusing iopen glocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-03-27gfs2: don't lock sd_log_flush_lock in try_rgrp_unlinkBob Peterson
In function try_rgrp_unlink, we added a temporary lock of the sd_log_flush_lock while searching the bitmaps. This protected us from problems in which dinodes being freed were still in a state of flux because the rgrp was in an active transaction. It was a kludge. Now that we've straightened out the code for inode eviction, deletes, and all the recovery mess, we no longer need this kludge. This patch removes it, and should improve performance. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_putAndreas Gruenbacher
Keeping reservations and quotas separate helps reviewing the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27gfs2: Change inode qa_data to allow multiple usersBob Peterson
Before this patch, multiple users called gfs2_qa_alloc which allocated a qadata structure to the inode, if quotas are turned on. Later, in file close or evict, the structure was deleted with gfs2_qa_delete. But there can be several competing processes who need access to the structure. There were races between file close (release) and the others. Thus, a release could delete the structure out from under a process that relied upon its existence. For example, chown. This patch changes the management of the qadata structures to be a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get and if the structure is allocated, the count essentially starts out at 1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the last guy to decrement the count to 0 frees the memory. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_allocBob Peterson
Before this patch, multiple callers called gfs2_rsqa_alloc to force the existence of a reservations structure and a quota data structure if needed. However, now the reservations are handled separately, so the quota data is only the quota data. So we eliminate the one in favor of just calling gfs2_qa_alloc directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Rework how rgrp buffer_heads are managedBob Peterson
Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Report errors before withdrawAndreas Gruenbacher
In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before withdrawing the filesystem: otherwise, when we withdraw first and withdraw is configured to panic, we'll never get to the error reporting. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-01-21gfs2: remove unused LBIT macrosAlex Shi
Since commit 223b2b889f37 ("GFS2: Fix alignment issue and tidy gfs2_bitfit"), these 3 macros aren't used anymore, so remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-03gfs2: Fix possible fs name overflowsBob Peterson
This patch fixes three places in which temporary character buffers could overflow due to the addition of the file system id from patch 3792ce973f07. Thanks to Dan Carpenter for pointing it out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: replace more printk with calls to fs_info and friendsBob Peterson
This patch replaces a few leftover printk errors with calls to fs_info and similar, so that the file system having the error is properly logged. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: dump fsid when dumping glock problemsBob Peterson
Before this patch, if a glock error was encountered, the glock with the problem was dumped. But sometimes you may have lots of file systems mounted, and that doesn't tell you which file system it was for. This patch adds a new boolean parameter fsid to the dump_glock family of functions. For non-error cases, such as dumping the glocks debugfs file, the fsid is not dumped in order to keep lock dumps and glocktop as clean as possible. For all error cases, such as GLOCK_BUG_ON, the file system id is now printed. This will make it easier to debug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}Andreas Gruenbacher
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no such thing as an "unrevoke" object; all this function does is remove existing revoke objects plus some bookkeeping. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix loop in gfs2_rbm_find (v2)Andreas Gruenbacher
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. This is an updated version of commit 2d29f6b96d ("gfs2: Fix loop in gfs2_rbm_find") which ended up being reverted because it introduced a performance regression in iozone (see commit e74c98ca2d). Changes since v1: - Simplify the wrap-around logic. - Handle the case where each resource group only has a single bitmap block (small filesystem). - Update rd_extfail_pt whenever we scan the entire bitmap, even when we don't start the scan at the very beginning of the bitmap. Fixes: e579ed4f446e ("GFS2: Introduce rbm field bii") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-01-31gfs2: Revert "Fix loop in gfs2_rbm_find"Andreas Gruenbacher
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34. It turns out that the fix can lead to a ~20 percent performance regression in initial writes to the page cache according to iozone. Let's revert this for now to have more time for a proper fix. Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-12gfs2: Dump nrpages for inodes and their glocksBob Peterson
This patch is based on an idea from Steve Whitehouse. The idea is to dump the number of pages for inodes in the glock dumps. The additional locking required me to drop const from quite a few places. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-12gfs2: Fix loop in gfs2_rbm_findAndreas Gruenbacher
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-11-09gfs2: Put bitmap buffers in put_superAndreas Gruenbacher
gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects attached to the resource group glocks. That function should release the buffers attached to the gfs2_bitmap objects (bi_bh), but the call to gfs2_rgrp_brelse for doing that is missing. When gfs2_releasepage later runs across these buffers which are still referenced, it refuses to free them. This causes the pages the buffers are attached to to remain referenced as well. With enough mount/unmount cycles, the system will eventually run out of memory. Fix this by adding the missing call to gfs2_rgrp_brelse in gfs2_clear_rgrpd. (Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.) Fixes: 39b0f1e92908 ("GFS2: Don't brelse rgrp buffer_heads every allocation") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-12gfs2: Pass resource group to rgblk_freeAndreas Gruenbacher
Function rgblk_free can only deal with one resource group at a time, so pass that resource group is as a parameter. Several of the callers already have the resource group at hand, so we only need additional lookup code in a few places. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Remove unnecessary gfs2_rlist_alloc parameterBob Peterson
The state parameter of gfs2_rlist_alloc is set to LM_ST_EXCLUSIVE in all calls, so remove it and hardcode that state in gfs2_rlist_alloc instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Fix marking bitmaps non-fullAndreas Gruenbacher
Reservations in gfs can span multiple gfs2_bitmaps (but they won't span multiple resource groups). When removing a reservation, we want to clear the GBF_FULL flags of all involved gfs2_bitmaps, not just that of the first bitmap. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Fix some minor typosAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Rename bitmap.bi_{len => bytes}Andreas Gruenbacher
This field indicates the size of the bitmap in bytes, similar to how the bi_blocks field indicates the size of the bitmap in blocks. In count_unlinked, replace an instance of bi_bytes * GFS2_NBBY by bi_blocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Remove unused RGRP_RSRV_MINBYTES definitionAndreas Gruenbacher
This definition is only used to define RGRP_RSRV_MINBLKS, with no benefit over defining RGRP_RSRV_MINBLKS directly. In addition, instead of forcing RGRP_RSRV_MINBLKS to be of type u32, cast it to that type where that type is required. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Move rs_{sizehint, rgd_gh} fields into the inodeAndreas Gruenbacher
Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv into the inode: they are more closely related to the inode than to a particular reservation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Clean up out-of-bounds check in gfs2_rbm_from_blockAndreas Gruenbacher
We already have a function that checks if a block is within a resource group, so use that in gfs2_rbm_from_block as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Always check the result of gfs2_rbm_from_blockAndreas Gruenbacher
When gfs2_rbm_from_block fails, the rbm it returns is undefined, so we always want to make sure gfs2_rbm_from_block has succeeded before looking at the rbm. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-05gfs2: Use fs_* functions instead of pr_* function where we canBob Peterson
Before this patch, various errors and messages were reported using the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does not tell you which gfs2 mount had the problem, which is often vital to debugging. This patch changes the calls from pr_* to fs_* in most of the messages so that the file system id is printed along with the message. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-08-28gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updatedBob Peterson
The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when a rgrp buffer is valid. It's cleared when the glock is invalidated, signifying that the buffer data is now invalid. But before this patch, function update_rgrp_lvb was setting the flag when it determined it had a valid lvb. But that's an invalid assumption: just because you have a valid lvb doesn't mean you have valid buffers. After all, another node may have made the lvb valid, and this node just fetched it from the glock via dlm. Consider this scenario: 1. The file system is mounted with RGRPLVB option. 2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks to GL_SKIP, it skips the gfs2_rgrp_bh_get. 3. Since loops == 0 and the allocation target (ap->target) is bigger than the largest known chunk of blocks in the rgrp (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses the call to gfs2_rgrp_bh_get there as well. 4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due to this invalid assumption. 5. The next time update_rgrp_lvb is called, it sees the bit is set and just returns 0, assuming both the lvb and rgrp are both uptodate. But since this is a smaller allocation, or space has been freed by another node, thus adjusting the lvb values, it decides to use the rgrp for allocations, with invalid rd_free due to the fact it was never updated. This patch changes update_rgrp_lvb so it doesn't set the UPTODATE flag anymore. That way, it has no choice but to fetch the latest values. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-08-28gfs2: improve debug information when lvb mismatches are foundBob Peterson
Before this patch, gfs2_rgrp_bh_get would check for lvb mismatches, but it wouldn't tell you what was actually wrong. This patch adds more information to help us debug it. It also makes rgrp consistency checks dump any bad rgrps, and the rgrp dump code dump any lvbs as well as the rgrp itself. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2018-08-08gfs2: eliminate update_rgrp_lvb_unlinkedBob Peterson
Function update_rgrp_lvb_unlinked used to do the same thing as be32_add_cpu. This patch removes it in favor of using be32_add_cpu directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2018-08-07gfs2: Fix gfs2_testbit to use clone bitmapsBob Peterson
Function gfs2_testbit is called in three places. Two of those places, gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used by the block reservations scheme to determine the length of an extent of free blocks. Before this patch, it wasn't using the clone bitmap, which means recently-freed blocks were treated as free blocks for the purposes of an allocation. This patch adds a new parameter to gfs2_testbit to indicate whether or not the clone bitmaps should be used (if available). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-26gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_outBob Peterson
Before this patch gfs2_rgrp_ondisk2lvb was called after every call to gfs2_rgrp_out. This patch just calls it directly from within gfs2_rgrp_out, and moves the function to be before it so we don't need a function prototype. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25GFS2: rgrp free blocks used incorrectlyBob Peterson
Before this patch, several functions in rgrp.c checked the value of rgd->rd_free_clone. That does not take into account blocks that were reserved by a multi-block reservation. This causes a problem when space gets tight in the file system. For example, when function gfs2_inplace_reserve checks to see if a rgrp has enough blocks to satisfy the request, it can accept a rgrp that it should reject because, although there are enough blocks to satisfy the request _now_, those blocks may be reserved for another running process. A second problem with this occurs when we've reserved the remaining blocks in an rgrp: function rg_mblk_search() can reject an rgrp improperly because it calculates: u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved; But rd_reserved includes blocks that the current process just reserved in its own call to inplace_reserve. For example, it can reserve the last 128 blocks of an rgrp, then reject that same rgrp because the above calculates out to free_blocks = 0; Consequences include, but are not limited to, (1) leaving holes, and thus increasing file system fragmentation, and (2) reporting file system is full long before it actually is. This patch introduces a new function, rgd_free, which returns the number of clone-free blocks (blocks that are truly free as opposed to blocks that are still being used because an unlinked file is still open) minus the number of blocks reserved by processes, but not counting the blocks we ourselves reserved (because obviously we need to allocate them). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24Merge branch 'iomap-write' into linux-gfs2/for-nextAndreas Gruenbacher
Pull in the gfs2 iomap-write changes: Tweak the existing code to properly support iomap write and eliminate an unnecessary special case in gfs2_block_map. Implement iomap write support for buffered and direct I/O. Simplify some of the existing code and eliminate code that is no longer used: gfs2: Remove gfs2_write_{begin,end} gfs2: iomap direct I/O support gfs2: gfs2_extent_length cleanup gfs2: iomap buffered write support gfs2: Further iomap cleanups This is based on the following changes on the xfs 'iomap-4.19-merge' branch: iomap: add private pointer to struct iomap iomap: add a page_done callback iomap: generic inline data handling iomap: complete partial direct I/O writes synchronously iomap: mark newly allocated buffer heads as new fs: factor out a __generic_write_end helper Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24gfs2: Don't reject a supposedly full bitmap if we have blocks reservedBob Peterson
Before this patch, you could get into situations like this: 1. Process 1 searches for X free blocks, finds them, makes a reservation 2. Process 2 searches for free blocks in the same rgrp, but now the bitmap is full because process 1's reservation is skipped over. So it marks the bitmap as GBF_FULL. 3. Process 1 tries to allocate blocks from its own reservation, but since the GBF_FULL bit is set, it skips over the rgrp and searches elsewhere, thus not using its own reservation. This patch adds an additional check to allow processes to use their own reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-05gfs2: Eliminate redundant ip->i_rgdAndreas Gruenbacher
GFS2 remembers the last rgrp used for allocations in ip->i_rgd. However, block allocations are made by way of a reservations structure, ip->i_res, which keeps the last rgrp in ip->i_res.rs_rgd, and ip->i_res is kept in sync with ip->i_res.rs_rgd, so it's redundant. Get rid of ip->i_rgd and just use ip->i_res.rs_rgd in its place. Based on patches by Robert Peterson. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-07-04gfs2: Stop messing with ip->i_rgd in the rlist codeAndreas Gruenbacher
In the resource group list code, keep the last resource group added in the last position in the array. Check against that instead of messing with ip->i_rgd. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21gfs2: eliminate rs_inum and reduce the size of gfs2 inodesBob Peterson
Before this patch, block reservations kept track of the inode number. At one point, that was a valid thing to do. However, since we made the reservation a part of the inode (rather than a pointer to a separate allocated object) the reservation can determine the inode number by using container_of. This saves us a little memory in our inode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04GFS2: gfs2_free_extlen can return an extent that is too longBob Peterson
Function gfs2_free_extlen calculates the length of an extent of free blocks that may be reserved. The end pointer was calculated as end = start + bh->b_size but b_size is incorrect because the bitmap usually stops prior to the end of the buffer data on the last bitmap. What this means is that when you do a write, you can reserve a chunk of blocks that runs off the end of the last bitmap. For example, I've got a file system where there is only one bitmap for each rgrp, so ri_length==1. I saw cases in which iozone tried to do a big write, grabbed a large block reservation, chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188. So 5464153 + 8188 = 5472341 which is the end of the rgrp. When it grabbed a reservation it got back: 5470936, length 7229. But 5470936 + 7229 = 5478165. So the reservation starts inside the rgrp but runs 5824 blocks past the end of the bitmap. This patch fixes the calculation so it won't exceed the last bitmap. It also adds a BUG_ON to guard against overflows in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-30gfs2: Add a few missing newlines in messagesAndreas Gruenbacher
Some of the info, warning, and error messages are missing their trailing newline. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-23GFS2: Log the reason for log flushes in every log headerBob Peterson
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23GFS2: Introduce new gfs2_log_header_v2Bob Peterson
This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-18Merge branch 'punch-hole' of ↵Bob Peterson
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
2018-01-17gfs2: Add gfs2_blk2rgrpd comment and fix incorrect useSteven Whitehouse
Document when to use gfs2_blk2rgrpd for "inexact" resource group matching. Based on that, fix an incorrect use of gfs2_blk2rgrpd in sweep_bh_for_rgrps. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add a crc field to resource group headersAndrew Price
Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This allows us to check resource group headers' integrity and removes the requirement to check them against the rindex entries in fsck. If this field is found to be zero, it should be ignored (or updated with an accurate value). Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add rindex fields to rgrp headersAndrew Price
Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields are identical to their counterparts in struct gfs2_rindex and are intended to reduce the use of the rindex. For now the fields are only written back as the in-memory equivalents in struct gfs2_rgrpd are set using values from the rindex. However, they are needed at this point so that userspace can make use of them, allowing a migration away from the rindex over time. The new fields take up previously reserved space which was explicitly zeroed on write so, in clusters with mixed kernels, these fields could get zeroed after being set and this should not be treated as an error. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add a next-resource-group pointer to resource groupsAndrew Price
Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The rg_skip field has the following meaning: - If rg_skip is zero, it is considered unset and not useful. - If rg_skip is non-zero, its value will be the number of blocks between this rgrp's address and the next rgrp's address. This can be used as a hint by fsck.gfs2 when rebuilding a bad rindex, for example. This will provide less dependency on the rindex in future, and allow tools such as fsck.gfs2 to iterate the resource groups without keeping the rindex around. The field is updated in gfs2_rgrp_out() so that existing file systems will have it set. This means that any resource groups that aren't ever written will not be updated. The final rgrp is a special case as there is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is extended). Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so the rg_skip field can get set back to 0 in cases where nodes with and without this patch are mixed in a cluster. In some cases, the field may bounce between being set by one node and then zeroed by another which may harm performance slightly, e.g. when two nodes create many small files. In testing this situation is rare but it becomes more likely as the filesystem fills up and there are fewer resource groups to choose from. The problem goes away when all nodes are running with this patch. Dipping into the space currently occupied by the rg_reserved field would have resulted in the same problem as it is also explicitly zeroed, so unfortunately there is no other way around it. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27GFS2: Combine gfs2_free_di with gfs2_free_uninit_diBob Peterson
Before this patch, function gfs2_free_di was 4 lines of code, and one of those lines was to call gfs2_free_uninit_di. Although unlikely, if function gfs2_free_uninit_di encountered an error finding the block to be freed, the error was silently ignored by the caller, which went ahead and improperly did a quota-change operation and meta_wipe despite the error. This patch combines the two functions into one to make the code more readable and fixes the bug by returning from the combined function before it takes those next incorrect steps. Signed-off-by: Bob Peterson <rpeterso@redhat.com>