aboutsummaryrefslogtreecommitdiff
path: root/fs/notify/mark.c
AgeCommit message (Collapse)Author
2017-10-31fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable fsnotify_mark.refcnt is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31fsnotify: fix pinning group in fsnotify_prepare_user_wait()Miklos Szeredi
Blind increment of group's user_waits is not enough, we could be far enough in the group's destruction that it isn't taken into account (i.e. grabbing the mark ref afterwards doesn't guarantee that it was the ref coming from the _group_ that was grabbed). Instead we need to check (under lock) that the mark is still attached to the group after having obtained a ref to the mark. If not, skip it. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31fsnotify: clean up fsnotify_prepare/finish_user_wait()Miklos Szeredi
This patch doesn't actually fix any bug, just paves the way for fixing mark and group pinning. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31fsnotify: Protect bail out path of fsnotify_add_mark_locked() properlyJan Kara
When fsnotify_add_mark_locked() fails it cleans up the mark it was adding. Since the mark is already visible in group's list, we should protect update of mark->flags with mark->lock. I'm not aware of any real issues this could cause (since we also hold group->mark_mutex) but better be safe and obey locking rules properly. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24fsnotify: remove a stray unlockDan Carpenter
We recently shifted this code around, so we're no longer holding the lock on this path. Fixes: 755b5bc681eb ("fsnotify: Remove indirection from mark list addition") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move ->free_mark callback to fsnotify_opsJan Kara
Pointer to ->free_mark callback unnecessarily occupies one long in each fsnotify_mark although they are the same for all marks from one notification group. Move the callback pointer to fsnotify_ops. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Add group pointer in fsnotify_init_mark()Jan Kara
Currently we initialize mark->group only in fsnotify_add_mark_lock(). However we will need to access fsnotify_ops of corresponding group from fsnotify_put_mark() so we need mark->group initialized earlier. Do that in fsnotify_init_mark() which has a consequence that once fsnotify_init_mark() is called on a mark, the mark has to be destroyed by fsnotify_put_mark(). Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Remove fsnotify_detach_group_marks()Jan Kara
The function is already mostly contained in what fsnotify_clear_marks_by_group() does. Just update that function to not select marks when all of them should be destroyed and remove fsnotify_detach_group_marks(). Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Rename fsnotify_clear_marks_by_group_flags()Jan Kara
The _flags() suffix in the function name was more confusing than explaining so just remove it. Also rename the argument from 'flags' to 'type' to better explain what the function expects. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()Jan Kara
These helpers are now only a simple assignment and just obfuscate what is going on. Remove them. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Provide framework for dropping SRCU lock in ->handle_eventJan Kara
fanotify wants to drop fsnotify_mark_srcu lock when waiting for response from userspace so that the whole notification subsystem is not blocked during that time. This patch provides a framework for safely getting mark reference for a mark found in the object list which pins the mark in that list. We can then drop fsnotify_mark_srcu, wait for userspace response and then safely continue iteration of the object list once we reaquire fsnotify_mark_srcu. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Remove special handling of mark destruction on group shutdownJan Kara
Currently we queue all marks for destruction on group shutdown and then destroy them from fsnotify_destroy_group() instead from a worker thread which is the usual path. However worker can already be processing some list of marks to destroy so this does not make 100% all marks are really destroyed by the time group is shut down. This isn't a big problem as each mark holds group reference and thus group stays partially alive until all marks are really freed but there's no point in complicating our lives - just wait for the delayed work to be finished instead. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Detach mark from object list when last reference is droppedJan Kara
Instead of removing mark from object list from fsnotify_detach_mark(), remove the mark when last reference to the mark is dropped. This will allow fanotify to wait for userspace response to event without having to hold onto fsnotify_mark_srcu. To avoid pinning inodes by elevated refcount (and thus e.g. delaying file deletion) while someone holds mark reference, we detach connector from the object also from fsnotify_destroy_marks() and not only after removing last mark from the list as it was now. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()Jan Kara
Currently we queue mark into a list of marks for destruction in __fsnotify_free_mark() and keep the last mark reference dangling. After the worker waits for SRCU period, it drops the last reference to the mark which frees it. This scheme has the disadvantage that if we hold reference to a mark and drop and reacquire SRCU lock, the mark can get freed immediately which is slightly inconvenient and we will need to avoid this in the future. Move to a scheme where queueing of mark into a list of marks for destruction happens when the last reference to the mark is dropped. Also drop reference to the mark held by group list already when mark is removed from that list instead of dropping it only from the destruction worker. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Free fsnotify_mark_connector when there is no mark attachedJan Kara
Currently we free fsnotify_mark_connector structure only when inode / vfsmount is getting freed. This can however impose noticeable memory overhead when marks get attached to inodes only temporarily. So free the connector structure once the last mark is detached from the object. Since notification infrastructure can be working with the connector under the protection of fsnotify_mark_srcu, we have to be careful and free the fsnotify_mark_connector only after SRCU period passes. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Lock object list with connector lockJan Kara
So far list of marks attached to an object (inode / vfsmount) was protected by i_lock or mnt_root->d_lock. This dictates that the list must be empty before the object can be destroyed although the list is now anchored in the fsnotify_mark_connector structure. Protect the list by a spinlock in the fsnotify_mark_connector structure to decouple lifetime of a list of marks from a lifetime of the object. This also simplifies the code quite a bit since we don't have to differentiate between inode and vfsmount lists in quite a few places anymore. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Remove useless list deletion and commentJan Kara
After removing all the indirection it is clear that hlist_del_init_rcu(&mark->obj_list); in fsnotify_destroy_marks() is not needed as the mark gets removed from the list shortly afterwards in fsnotify_destroy_mark() -> fsnotify_detach_mark() -> fsnotify_detach_from_object(). Also there is no problem with mark being visible on object list while we call fsnotify_destroy_mark() as parallel destruction of marks from several places is properly handled (as mentioned in the comment in fsnotify_destroy_marks(). So just remove the list removal and also the stale comment. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Avoid double locking in fsnotify_detach_from_object()Jan Kara
We lock object list lock in fsnotify_detach_from_object() twice - once to detach mark and second time to recalculate mask. That is unnecessary and later it will become problematic as we will free the connector as soon as there is no mark in it. So move recalculation of fsnotify mask into the same critical section that is detaching mark. This also removes recalculation of child dentry flags from fsnotify_detach_from_object(). That is however fine. Those marks will get recalculated once some event happens on a child. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Remove indirection from fsnotify_detach_mark()Jan Kara
fsnotify_detach_mark() calls fsnotify_destroy_inode_mark() or fsnotify_destroy_vfsmount_mark() to remove mark from object list. These two functions are however very similar and differ only in the lock they use to protect the object list of marks. Simplify the code by removing the indirection and removing mark from the object list in a common function. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Determine lock in fsnotify_destroy_marks()Jan Kara
Instead of passing spinlock into fsnotify_destroy_marks() determine it directly in that function from the connector type. This will reduce code churn when changing lock protecting list of marks. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move locking into fsnotify_find_mark()Jan Kara
Move locking of a mark list into fsnotify_find_mark(). This reduces code churn in the following patch changing lock protecting the list. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move locking into fsnotify_recalc_mask()Jan Kara
Move locking of locks protecting a list of marks into fsnotify_recalc_mask(). This reduces code churn in the following patch which changes the lock protecting the list of marks. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move fsnotify_destroy_marks()Jan Kara
Move fsnotify_destroy_marks() to be later in the fs/notify/mark.c. It will need some functions that are declared after its current declaration. No functional change. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Remove indirection from mark list additionJan Kara
Adding notification mark to object list has been currently done through fsnotify_add_{inode|vfsmount}_mark() helpers from fsnotify_add_mark_locked() which call fsnotify_add_mark_list(). Remove this unnecessary indirection to simplify the code. Pushing all the locking to fsnotify_add_mark_list() also allows us to allocate the connector structure with GFP_KERNEL mode. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Make fsnotify_mark_connector hold inode referenceJan Kara
Currently inode reference is held by fsnotify marks. Change the rules so that inode reference is held by fsnotify_mark_connector structure whenever the list is non-empty. This simplifies the code and is more logical. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move object pointer to fsnotify_mark_connectorJan Kara
Move pointer to inode / vfsmount from mark itself to the fsnotify_mark_connector structure. This is another step on the path towards decoupling inode / vfsmount lifetime from notification mark lifetime. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Move mark list head from object into dedicated structureJan Kara
Currently notification marks are attached to object (inode or vfsmnt) by a hlist_head in the object. The list is also protected by a spinlock in the object. So while there is any mark attached to the list of marks, the object must be pinned in memory (and thus e.g. last iput() deleting inode cannot happen). Also for list iteration in fsnotify() to work, we must hold fsnotify_mark_srcu lock so that mark itself and mark->obj_list.next cannot get freed. Thus we are required to wait for response to fanotify events from userspace process with fsnotify_mark_srcu lock held. That causes issues when userspace process is buggy and does not reply to some event - basically the whole notification subsystem gets eventually stuck. So to be able to drop fsnotify_mark_srcu lock while waiting for response, we have to pin the mark in memory and make sure it stays in the object list (as removing the mark waiting for response could lead to lost notification events for groups later in the list). However we don't want inode reclaim to block on such mark as that would lead to system just locking up elsewhere. This commit is the first in the series that paves way towards solving these conflicting lifetime needs. Instead of anchoring the list of marks directly in the object, we anchor it in a dedicated structure (fsnotify_mark_connector) and just point to that structure from the object. The following commits will also add spinlock protecting the list and object pointer to the structure. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10fsnotify: Update commentsJan Kara
Add a comment that lifetime of a notification mark is protected by SRCU and remove a comment about clearing of marks attached to the inode. It is stale and more uptodate version is at fsnotify_destroy_marks() which is the function handling this case. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-12-23fsnotify: Remove fsnotify_duplicate_mark()Jan Kara
There are only two calls sites of fsnotify_duplicate_mark(). Those are in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused for audit tree, inode pointer and group gets set in fsnotify_add_mark_locked() later anyway, mask and free_mark are already set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is actively harmful because following fsnotify_add_mark_locked() will leak group reference by overwriting the group pointer. So just remove the two calls to fsnotify_duplicate_mark() and the function. Signed-off-by: Jan Kara <jack@suse.cz> [PM: line wrapping to fit in 80 chars] Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-05-19fsnotify: avoid spurious EMFILE errors from inotify_init()Jan Kara
Inotify instance is destroyed when all references to it are dropped. That not only means that the corresponding file descriptor needs to be closed but also that all corresponding instance marks are freed (as each mark holds a reference to the inotify instance). However marks are freed only after SRCU period ends which can take some time and thus if user rapidly creates and frees inotify instances, number of existing inotify instances can exceed max_user_instances limit although from user point of view there is always at most one existing instance. Thus inotify_init() returns EMFILE error which is hard to justify from user point of view. This problem is exposed by LTP inotify06 testcase on some machines. We fix the problem by making sure all group marks are properly freed while destroying inotify instance. We wait for SRCU period to end in that path anyway since we have to make sure there is no event being added to the instance while we are tearing down the instance. So it takes only some plumbing to allow for marks to be destroyed in that path as well and not from a dedicated work item. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Tested-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18fsnotify: turn fsnotify reaper thread into a workqueue jobJeff Layton
We don't require a dedicated thread for fsnotify cleanup. Switch it over to a workqueue job instead that runs on the system_unbound_wq. In the interest of not thrashing the queued job too often when there are a lot of marks being removed, we delay the reaper job slightly when queueing it, to allow several to gather on the list. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"Jeff Layton
This reverts commit c510eff6beba ("fsnotify: destroy marks with call_srcu instead of dedicated thread"). Eryu reported that he was seeing some OOM kills kick in when running a testcase that adds and removes inotify marks on a file in a tight loop. The above commit changed the code to use call_srcu to clean up the marks. While that does (in principle) work, the srcu callback job is limited to cleaning up entries in small batches and only once per jiffy. It's easily possible to overwhelm that machinery with too many call_srcu callbacks, and Eryu's reproduer did just that. There's also another potential problem with using call_srcu here. While you can obviously sleep while holding the srcu_read_lock, the callbacks run under local_bh_disable, so you can't sleep there. It's possible when putting the last reference to the fsnotify_mark that we'll end up putting a chain of references including the fsnotify_group, uid, and associated keys. While I don't see any obvious ways that that could occurs, it's probably still best to avoid using call_srcu here after all. This patch reverts the above patch. A later patch will take a different approach to eliminated the dedicated thread here. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Cc: Jan Kara <jack@suse.com> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14fsnotify: destroy marks with call_srcu instead of dedicated threadJeff Layton
At the time that this code was originally written, call_srcu didn't exist, so this thread was required to ensure that we waited for that SRCU grace period to settle before finally freeing the object. It does exist now however and we can much more efficiently use call_srcu to handle this. That also allows us to potentially use srcu_barrier to ensure that they are all of the callbacks have run before proceeding. In order to conserve space, we union the rcu_head with the g_list. This will be necessary for nfsd which will allocate marks from a dedicated slabcache. We have to be able to ensure that all of the objects are destroyed before destroying the cache. That's fairly Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: Eric Paris <eparis@parisplace.org> Reviewed-by: Jan Kara <jack@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04fsnotify: get rid of fsnotify_destroy_mark_locked()Jan Kara
fsnotify_destroy_mark_locked() is subtle to use because it temporarily releases group->mark_mutex. To avoid future problems with this function, split it into two. fsnotify_detach_mark() is the part that needs group->mark_mutex and fsnotify_free_mark() is the part that must be called outside of group->mark_mutex. This way it's much clearer what's going on and we also avoid some pointless acquisitions of group->mark_mutex. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04fsnotify: remove mark->free_listJan Kara
Free list is used when all marks on given inode / mount should be destroyed when inode / mount is going away. However we can free all of the marks without using a special list with some care. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()Jan Kara
fsnotify_clear_marks_by_group_flags() can race with fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked() drops mark_mutex, a mark from the list iterated by fsnotify_clear_marks_by_group_flags() can be freed and thus the next entry pointer we have cached may become stale and we dereference free memory. Fix the problem by first moving marks to free to a special private list and then always free the first entry in the special list. This method is safe even when entries from the list can disappear once we drop the lock. Signed-off-by: Jan Kara <jack@suse.com> Reported-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-21Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()"Linus Torvalds
This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf. Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms: "Thanks for report! You are right that my patch introduces a race between fsnotify kthread and fsnotify_destroy_group() which can result in leaking inotify event on group destruction. I haven't yet decided whether the right fix is not to queue events for dying notification group (as that is pointless anyway) or whether we should just fix the original problem differently... Whenever I look at fsnotify code mark handling I get lost in the maze of locks, lists, and subtle differences between how different notification systems handle notification marks :( I'll think about it over night" and after thinking about it, Jan says: "OK, I have looked into the code some more and I found another relatively simple way of fixing the original oops. It will be IMHO better than trying to fixup this issue which has more potential for breakage. I'll ask Linus to revert the fsnotify fix he already merged and send a new fix" Reported-by: Kinglong Mee <kinglongmee@gmail.com> Requested-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-17fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()Jan Kara
fsnotify_clear_marks_by_group_flags() can race with fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops mark_mutex, a mark from the list iterated by fsnotify_clear_marks_by_group_flags() can be freed and we dereference free memory in the loop there. Fix the problem by keeping mark_mutex held in fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that we need to call a ->freeing_mark() callback which may acquire mark_mutex again. To avoid this and similar lock inversion issues, we move the call to ->freeing_mark() callback to the kthread destroying the mark. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Ashish Sangwan <a.sangwan@samsung.com> Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fsnotify: remove destroy_list from fsnotify_markJan Kara
destroy_list is used to track marks which still need waiting for srcu period end before they can be freed. However by the time mark is added to destroy_list it isn't in group's list of marks anymore and thus we can reuse fsnotify_mark->g_list for queueing into destroy_list. This saves two pointers for each fsnotify_mark. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fsnotify: unify inode and mount marks handlingJan Kara
There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13fanotify: fix notification of groups with inode & mount marksJan Kara
fsnotify() needs to merge inode and mount marks lists when notifying groups about events so that ignore masks from inode marks are reflected in mount mark notifications and groups are notified in proper order (according to priorities). Currently the sorting of the lists done by fsnotify_add_inode_mark() / fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted ignore masks not being used in some cases. Fix the problem by always using the same comparison function when sorting / merging the mark lists. Thanks to Heinrich Schuchardt for improvements of my patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721 Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/notify/mark.c: trivial cleanupDavid Cohen
Do not initialize private_destroy_list twice. list_replace_init() already takes care of initializing private_destroy_list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fsnotify: update comments concerning locking schemeLino Sanfilippo
There have been changes in the locking scheme of fsnotify but the comments in the source code have not been updated yet. This patch corrects this. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11fsnotify: change locking orderLino Sanfilippo
On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote: > > I finally built and tested a v3.0 kernel with these patches (I know I'm > SOOOOOO far behind). Not what I hoped for: > > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day... > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 > > [ 150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 150.946012] CPU 0 > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan] > > [ 150.946012] > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM > > [ 150.946012] RIP: 0010:[<ffffffff810ffd58>] [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282 > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438 > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240 > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000 > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428 > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610 > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000 > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0 > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760) > > [ 150.946012] Stack: > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76 > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250 > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438 > > [ 150.946012] Call Trace: > > [ 150.946012] [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130 > > [ 150.946012] [<ffffffff8115e9be>] evict+0x7e/0x170 > > [ 150.946012] [<ffffffff8115ee40>] iput_final+0xd0/0x190 > > [ 150.946012] [<ffffffff8115ef33>] iput+0x33/0x40 > > [ 150.946012] [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160 > > [ 150.946012] [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50 > > [ 150.946012] [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0 > > [ 150.946012] [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a > > [ 150.946012] RIP [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP <ffff88002c2e5df8> > > [ 150.946012] CR2: 0000000000000070 > > Looks at aweful lot like the problem from: > http://www.spinics.net/lists/linux-fsdevel/msg46101.html > I tried to reproduce this bug with your test program, but without success. However, if I understand correctly, this occurs since we dont hold any locks when we call iput() in mark_destroy(), right? With the patches you tested, iput() is also not called within any lock, since the groups mark_mutex is released temporarily before iput() is called. This is, since the original codes behaviour is similar. However since we now have a mutex as the biggest lock, we can do what you suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and call iput() with the mutex held to avoid the race. The patch below implements this. It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: dont put marks on temporary list when clearing marks by groupLino Sanfilippo
In clear_marks_by_group_flags() the mark list of a group is iterated and the marks are put on a temporary list. Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list any more and are able to remove the marks while the mark list is iterated and the mark list mutex is held. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: introduce locked versions of fsnotify_add_mark() and ↵Lino Sanfilippo
fsnotify_remove_mark() This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked() which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but assume that the caller has already taken the groups mark mutex. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: pass group to fsnotify_destroy_mark()Lino Sanfilippo
In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: use a mutex instead of a spinlock to protect a groups mark listLino Sanfilippo
Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead of a spinlock results in more flexibility (i.e it allows to sleep while the lock is held). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: take groups mark_lock before mark lockLino Sanfilippo
Race-free addition and removal of a mark to a groups mark list would be easier if we could lock the mark list of group before we lock the specific mark. This patch changes the order used to add/remove marks to/from mark lists from 1. mark->lock 2. group->mark_lock 3. inode->i_lock to 1. group->mark_lock 2. mark->lock 3. inode->i_lock Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: use reference counting for groupsLino Sanfilippo
Get a group ref for each mark that is added to the groups list and release that ref when the mark is freed in fsnotify_put_mark(). We also use get a group reference for duplicated marks and for private event data. Now we dont free a group any more when the number of marks becomes 0 but when the groups ref count does. Since this will only happen when all marks are removed from a groups mark list, we dont have to set the groups number of marks to 1 at group creation. Beside clearing all marks in fsnotify_destroy_group() we do also flush the groups event queue. This is since events may hold references to groups (due to private event data) and we have to put those references first before we get a chance to put the final ref, which will result in a call to fsnotify_final_destroy_group(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>