Age | Commit message (Collapse) | Author |
|
max_discard_sectors must be set for the queue to support discard.
Operations implementing discard for rbd zero data, so report that.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Only allocate two osd ops for discard requests, since the
preallocation hint is only added for regular writes. Use
rbd_img_obj_request_fill() to recreate the original write or discard
osd operations, isolating that logic to one place, and change the
assert in rbd_osd_req_create_copyup() to accept discard requests as
well.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
rbd_img_request_fill() creates a ceph_osd_request and has logic for
adding the appropriate osd ops to it based on the request type and
image properties.
For layered images, the original rbd_obj_request is resent with a
copyup operation in front, using a new ceph_osd_request. The logic for
adding the original operations should be the same as when first
sending them, so move it to a helper function.
op_type only needs to be checked once, so create a helper for that as
well and call it outside the loop in rbd_img_request_fill().
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Discard requests are a form of write, so they should go through the
same process as plain write requests and trigger copy-on-write for
layered images.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Discard may try to delete an object from a non-layered image that does not exist.
If this occurs, the image already has no data in that range, so change the
result to success.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Discards take a reference to the snapshot context of an image when
they are created. This reference needs to be cleaned up when the
request is done just as it is for regular writes.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
In rbd_img_request_fill() the image size is only checked to determine
whether we can truncate an object instead of zeroing it for discard
requests. Take rbd_dev->header_rwsem while reading the image size, and
move this read into the discard check, so that non-discard ops don't
need to take the semaphore in this function.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
This patch add the discard support for rbd driver.
There are three types operation in the driver:
1. The objects would be removed if they completely contained
within the discard range.
2. The objects would be truncated if they partly contained within
the discard range, and align with their boundary.
3. Others would be zeroed.
A discard request from blkdev_issue_discard() is defined which
REQ_WRITE and REQ_DISCARD both marked and no data, so we must
check the REQ_DISCARD first when getting the request type.
This resolve:
http://tracker.ceph.com/issues/190
[ Ilya Dryomov: This is incomplete and somewhat buggy, see follow up
commits by Josh Durgin for refinements and fixes which weren't
folded in to preserve authorship. ]
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
It could only handle the read and write operations now,
extend it for the coming discard support.
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
It need to copyup the parent's content when layered writing,
but an entire object write would overwrite it, so skip it.
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
To clarify the conditions and make it easier to add new ones.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
|
These fields may both change while the image is mapped if a snapshot
is created or deleted or the image is resized. They are guarded by
rbd_dev->header_rwsem, so hold that while reading them, and store a
local copy to refer to outside of the critical section. The local copy
will stay consistent since the snapshot context is reference counted,
and the mapping size is just a u64. This prevents torn loads from
giving us inconsistent values.
Move reading header.snapc into the caller of rbd_img_request_create()
so that we only need to take the semaphore once. The read-only caller,
rbd_parent_request_create() can just pass NULL for snapc, since the
snapshot context is only relevant for writes.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Trying to map an image out of a pool for which we don't have an 'x'
permission bit fails with -ERANGE from ceph_extract_encoded_string()
due to an unsigned vs signed bug. Fix it and get rid of the -EINVAL
sink, thus propagating rbd::get_id cls method errors. (I've seen
a bunch of unexplained -ERANGE reports, I bet this is it).
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it. So don't bother at all.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
we may corrupt waiting list if a request in the waiting list is kicked.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
Use the more common pr_warn.
Other miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
|
So the recovering MDS does not need to fetch these ununsed inodes during
cache rejoin. This may reduce MDS recovery time.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
|
If the state variable is krealloced successfully, map->osd_state will be
freed, once following two reallocation failed, and exit the function
without resetting map->osd_state, map->osd_state become a wild pointer.
fix it by resetting them after krealloc successfully.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
|
We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just
CRYPTO_AES. Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount
with
libceph: error -2 building auth method x request
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
|
Both not yet registered (r_linger && list_empty(&r_linger_item)) and
registered linger requests should use the new tid on resend to avoid
the dup op detection logic on the OSDs, yet we were doing this only for
"registered" case. Factor out and simplify the "registered" logic and
use the new helper for "not registered" case as well.
Fixes: http://tracker.ceph.com/issues/8806
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Introduce __enqueue_request() and switch to it.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of two small fixes, both to code which went in during
the merge window: cxgb4i has a scheduling in atomic bug in its new
ipv6 code and uas fails to work properly with the new scsi-mq code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] uas: disable use of blk-mq I/O path
[SCSI] cxgb4i: avoid holding mutex in interrupt context
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/josh/linux
Pull kconfig fixes for tiny setups from Josh Triplett:
"Two Kconfig bugfixes for 3.17 related to tinification. These fixes
make the Kconfig "General Setup" menu much more usable"
* tag 'tiny/kconfig-for-3.17' of https://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
init/Kconfig: Hide printk log config if CONFIG_PRINTK=n
|
|
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in
the middle of the options for the EXPERT menu. However,
HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
placing items in the EXPERT menu, and displays the remaining several
EXPERT items (starting with EPOLL) directly in the General Setup menu.
Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the
subsequent items display as part of the EXPERT menu again; the EMBEDDED
menu now appears as the next top-level item in the General Setup menu,
which makes General Setup much shorter and more usable.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable <stable@vger.kernel.org>
|
|
The buffers sized by CONFIG_LOG_BUF_SHIFT and
CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't
ask about their size at all.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable <stable@vger.kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two i2c driver bugfixes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: qup: Fix order of runtime pm initialization
i2c: rk3x: fix 0 length write transfers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull trace ring buffer iterator fix from Steven Rostedt:
"While testing some new changes for 3.18, I kept hitting a bug every so
often in the ring buffer. At first I thought it had to do with some
of the changes I was working on, but then testing something else I
realized that the bug was in 3.17 itself. I ran several bisects as
the bug was not very reproducible, and finally came up with the commit
that I could reproduce easily within a few minutes, and without the
change I could run the tests over an hour without issue. The change
fit the bug and I figured out a fix. That bad commit was:
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
This commit fixed a bug, but in the process created another one. It
used the wrong value as the cached value that is used to see if things
changed while an iterator was in use. This made it look like a change
always happened, and could cause the iterator to go into an infinite
loop"
* tag 'trace-fixes-v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Fix infinite spin in reading buffer
|
|
Pull cifs/smb3 fixes from Steve French:
"Fix for CIFS/SMB3 oops on reconnect during readpages (3.17 regression)
and for incorrectly closing file handle in symlink error cases"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix readpages retrying on reconnects
Fix problem recognizing symlinks
|
|
Pull raid5 discard fix from Neil Brown:
"One fix for raid5 discard issue"
* tag 'md/3.17-final-fix' of git://neil.brown.name/md:
md/raid5: disable 'DISCARD' by default due to safety concerns.
|
|
Pull drm fixes from Dave Airlie:
"Nothing too major or scary.
One i915 regression fix, nouveau has a tmds regression fix, along with
a regression fix for the runtime pm code for optimus laptops not
restoring the display hw correctly"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: make sure display hardware is reinitialised on runtime resume
drm/nouveau: punt fbcon resume out to a workqueue
drm/nouveau: fix regression on original nv50 board
drm/nv50/disp: fix dpms regression on certain boards
drm/i915: Flush the PTEs after updating them before suspend
|
|
The uas driver uses the block layer tag for USB3 stream IDs. With
blk-mq we can get larger tag numbers that the queue depth, which breaks
this assumption. A fix is under way for 3.18, but sits on top of
large changes so can't easily be backported. Set the disable_blk_mq
path so that a uas device can't easily crash the system when using
blk-mq for SCSI.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are three regression fixes (cpufreq core, pcc-cpufreq, i915 /
ACPI) and one trivial fix for a callback return value mismatch in the
cpufreq integrator driver.
Specifics:
- A recent cpufreq core fix went too far and introduced a regression
in the system suspend code path. Fix from Viresh Kumar.
- An ACPI-related commit in the i915 driver that fixed backlight
problems for some Thinkpads inadvertently broke a Dell machine (in
3.16). Fix from Aaron Lu.
- The pcc-cpufreq driver was broken during the 3.15 cycle by a commit
that put wait_event() under a spinlock by mistake. Fix that
(Rafael J Wysocki).
- The return value type of integrator_cpufreq_remove() is void, but
should be int. Fix from Arnd Bergmann"
* tag 'pm+acpi-3.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: update 'cpufreq_suspended' after stopping governors
ACPI / i915: Update the condition to ignore firmware backlight change request
cpufreq: integrator: fix integrator_cpufreq_remove return type
cpufreq: pcc-cpufreq: Fix wait_event() under spinlock
|
|
git://anongit.freedesktop.org/drm-intel into drm-fixes
final regression fix for 3.17.
* tag 'drm-intel-fixes-2014-10-02' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Flush the PTEs after updating them before suspend
|
|
The runtime pm calls need to be done before populating the children via the
i2c_add_adapter call. If this is not done, a child can run into issues trying
to do i2c read/writes due to the pm_runtime_sync failing.
Signed-off-by: Andy Gross <agross@codeaurora.org>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
i2cdetect -q was broken (everything was a false positive, and no transfers were
actually being sent over i2c). The way it works is by sending a 0 length write
request and checking for NACK. This patch fixes the 0 length writes and actually
sends them.
Reported-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Alexandru M Stan <amstan@chromium.org>
Tested-by: Doug Anderson <dianders@chromium.org>
Tested-by: Max Schwarz <max.schwarz@online.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
* pm-cpufreq:
cpufreq: update 'cpufreq_suspended' after stopping governors
cpufreq: integrator: fix integrator_cpufreq_remove return type
cpufreq: pcc-cpufreq: Fix wait_event() under spinlock
* acpi-video:
ACPI / i915: Update the condition to ignore firmware backlight change request
|
|
Merge fixes from Andrew Morton:
"5 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: page_alloc: fix zone allocation fairness on UP
perf: fix perf bug in fork()
MAINTAINERS: change git URL for mpc5xxx tree
mm: memcontrol: do not iterate uninitialized memcgs
ocfs2/dlm: should put mle when goto kill in dlm_assert_master_handler
|
|
The zone allocation batches can easily underflow due to higher-order
allocations or spills to remote nodes. On SMP that's fine, because
underflows are expected from concurrency and dealt with by returning 0.
But on UP, zone_page_state will just return a wrapped unsigned long,
which will get past the <= 0 check and then consider the zone eligible
until its watermarks are hit.
Commit 3a025760fc15 ("mm: page_alloc: spill to remote nodes before
waking kswapd") already made the counter-resetting use
atomic_long_read() to accomodate underflows from remote spills, but it
didn't go all the way with it.
Make it clear that these batches are expected to go negative regardless
of concurrency, and use atomic_long_read() everywhere.
Fixes: 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy")
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process. This is bad..
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The repository for mpc5xxx has been moved, update git URL to new
location.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The cgroup iterators yield css objects that have not yet gone through
css_online(), but they are not complete memcgs at this point and so the
memcg iterators should not return them. Commit d8ad30559715 ("mm/memcg:
iteration skip memcgs not yet fully initialized") set out to implement
exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does
not meet the ordering requirements for memcg, and so the iterator may
skip over initialized groups, or return partially initialized memcgs.
The cgroup core can not reasonably provide a clear answer on whether the
object around the css has been fully initialized, as that depends on
controller-specific locking and lifetime rules. Thus, introduce a
memcg-specific flag that is set after the memcg has been initialized in
css_online(), and read before mem_cgroup_iter() callers access the memcg
members.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In dlm_assert_master_handler, the mle is get in dlm_find_mle, should be
put when goto kill, otherwise, this mle will never be released.
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"One last time regression fix at em28xx. The removal of .reset_resume
broke suspend/resume on this driver for some devices.
There are more fixes to be done for em28xx suspend/resume to be better
handled, but I'm opting to let them to stay for a while at the media
devel tree, in order to get more tests. So, for now, let's just
revert this patch"
* tag 'media/v3.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "[media] media: em28xx - remove reset_resume interface"
|
|
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.
A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.
The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.
Commit 651e22f2701b changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.
Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"One late but trivial patch to fix the serial console on parisc
machines which got broken during the 3.17 release cycle"
* 'parisc-3.17-8' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix serial console for machines with serial port on superio chip
|
|
If we got a reconnect error from async readv we re-add pages back
to page_list and continue loop. That is wrong because these pages
have been already added to the pagecache but page_list has pages that
have not been added to the pagecache yet. This ends up with a general
protection fault in put_pages after readpages. Fix it by not retrying
the read of these pages and falling back to readpage instead.
Fixes debian bug 762306
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
|
|
Changeset eb85d94bd introduced a problem where if a cifs open
fails during query info of a file we
will still try to close the file (happens with certain types
of reparse points) even though the file handle is not valid.
In addition for SMB2/SMB3 we were not mapping the return code returned
by Windows when trying to open a file (like a Windows NFS symlink)
which is a reparse point.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
CC: stable <stable@vger.kernel.org> #v3.13+
|
|
Merge NUMA balancing related fixlets from Mel Gorman:
"There were a few minor changes so am resending just the two patches
that are mostly likely to affect the bug Dave and Sasha saw and marked
them for stable.
I'm less confident it will address Sasha's problem because while I
have not kept up to date, I believe he's also seeing memory corruption
issues in next from an unknown source. Still, it would be nice to see
how they affect trinity testing.
I'll send the MPOL_MF_LAZY patch separately because it's not urgent"
* emailed patches from Mel Gorman <mgorman@suse.de>:
mm: numa: Do not mark PTEs pte_numa when splitting huge pages
mm: migrate: Close race between migration completion and mprotect
|