aboutsummaryrefslogtreecommitdiff
path: root/drivers/scsi/scsi_lib.c
AgeCommit message (Collapse)Author
2011-05-18Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: don't delay blk_run_queue_async scsi: remove performance regression due to async queue run blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup block: rescan partitions on invalidated devices on -ENOMEDIA too cdrom: always check_disk_change() on open block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
2011-05-17scsi: remove performance regression due to async queue runJens Axboe
Commit c21e6beb removed our queue request_fn re-enter protection, and defaulted to always running the queues from kblockd to be safe. This was a known potential slow down, but should be safe. Unfortunately this is causing big performance regressions for some, so we need to improve this logic. Looking into the details of the re-enter, the real issue is on requeue of requests. Requeue of requests upon seeing a BUSY condition from the device ends up re-running the queue, causing traces like this: scsi_request_fn() scsi_dispatch_cmd() scsi_queue_insert() __scsi_queue_insert() scsi_run_queue() scsi_request_fn() ... potentially causing the issue we want to avoid. So special case the requeue re-run of the queue, but improve it to offload the entire run of local queue and starved queue from a single workqueue callback. This is a lot better than potentially kicking off a workqueue run for each device seen. This also fixes the issue of the local device going into recursion, since the above mentioned commit never moved that queue run out of line. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-03[SCSI] fix oops in scsi_run_queue()James Bottomley
The recent commit closing the race window in device teardown: commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b Author: James Bottomley <James.Bottomley@suse.de> Date: Fri Apr 22 10:39:59 2011 -0500 [SCSI] put stricter guards on queue dead checks is causing a potential NULL deref in scsi_run_queue() because the q->queuedata may already be NULL by the time this function is called. Since we shouldn't be running a queue that is being torn down, simply add a NULL check in scsi_run_queue() to forestall this. Tested-by: Jim Schutt <jaschut@sandia.gov> Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-19block: get rid of QUEUE_FLAG_REENTERJens Axboe
We are currently using this flag to check whether it's safe to call into ->request_fn(). If it is set, we punt to kblockd. But we get a lot of false positives and excessive punts to kblockd, which hurts performance. The only real abuser of this infrastructure is SCSI. So export the async queue run and convert SCSI over to use that. There's room for improvement in that SCSI need not always use the async call, but this fixes our performance issue and they can fix that up in due time. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18block: add blk_run_queue_asyncChristoph Hellwig
Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-24Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
2011-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-14[SCSI] sd: Logical Block Provisioning updateMartin K. Petersen
SBC3r26 contains many changes to the Logical Block Provisioning interfaces (formerly known as Thin Provisioning ditto). This patch implements support for both the old and new schemes using the same heuristic as before (whether the LBP VPD page is present). The new code also allows the provisioning mode (i.e. choice of command) to be overridden on a per-device basis via sysfs. Two additional modes are supported in this version: - WRITE SAME(10) with the UNMAP bit set - WRITE SAME(10) without the UNMAP bit set. This allows us to support devices that predate the TP/LBP enhancements in SBC3 and which work by way zero-detection Switching between modes has been consolidated in a helper function that also updates the block layer topology according to the limitations of the chosen command. I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10) if WRITE SAME(16) fails, etc. but found several devices that got cranky. So for now we'll disable discard if one of the commands fail. The user still has the option of selecting a different mode in sysfs. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14[SCSI] Include protection operation in SCSI command traceMartin K. Petersen
When debugging DIF/DIX it is very helpful to be able to see which DIX operation is associated with the scsi_cmnd. Include the protection op in the SCSI command trace. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-10Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe
Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10scsi: convert to blk_delay_queue()Jens Axboe
It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. A default delay of 3 msec is defined, to match the previous behaviour. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: add @force_kblockd to __blk_run_queue()Tejun Heo
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-02-12[SCSI] Add detailed SCSI I/O errorsHannes Reinecke
Instead of just passing 'EIO' for any I/O error we should be notifying the upper layers with more details about the cause of this error. Update the possible I/O errors to: - ENOLINK: Link failure between host and target - EIO: Retryable I/O error - EREMOTEIO: Non-retryable I/O error - EBADE: I/O error restricted to the I_T_L nexus 'Retryable' in this context means that an I/O error _might_ be restricted to the I_T_L nexus (vulgo: path), so retrying on another nexus / path might succeed. 'Non-retryable' in general refers to a target failure, so this error will always be generated regardless of the I_T_L nexus it was send on. I/O errors restricted to the I_T_L nexus might be retried on another nexus / path, but they should _not_ be queued if no paths are available. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-01-13Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...
2011-01-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (147 commits) [SCSI] arcmsr: fix write to device check [SCSI] lpfc: lower stack use in lpfc_fc_frame_check [SCSI] eliminate an unnecessary local variable from scsi_remove_target() [SCSI] libiscsi: use bh locking instead of irq with session lock [SCSI] libiscsi: do not take host lock in queuecommand [SCSI] be2iscsi: fix null ptr when accessing task hdr [SCSI] be2iscsi: fix gfp use in alloc_pdu [SCSI] libiscsi: add more informative failure message during iscsi scsi eh [SCSI] gdth: Add missing call to gdth_ioctl_free [SCSI] bfa: remove unused defintions and misc cleanups [SCSI] bfa: remove inactive functions [SCSI] bfa: replace bfa_assert with WARN_ON [SCSI] qla2xxx: Use sg_next to fetch next sg element while walking sg list. [SCSI] qla2xxx: Fix to avoid recursive lock failure during BSG timeout. [SCSI] qla2xxx: Remove code to not reset ISP82xx on failure. [SCSI] qla2xxx: Display mailbox register 4 during 8012 AEN for ISP82XX parts. [SCSI] qla2xxx: Don't perform a BIG_HAMMER if Get-ID (0x20) mailbox command fails on CNAs. [SCSI] qla2xxx: Remove redundant module parameter permission bits [SCSI] qla2xxx: Add sysfs node for displaying board temperature. [SCSI] qla2xxx: Code cleanup to remove unwanted comments and code. ...
2010-12-21[SCSI] fix the return value of scsi_target_queue_read()Hillf Danton
It seems that zero should be returned if scsi_target_is_busy(starget) is true, no matter if sdev is on the starved list. Signed-off-by: Hillf Danton <dhillf@gmail.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-12-20Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix cciss_revalidate panic block: max hardware sectors limit wrapper block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead blk-throttle: Correct the placement of smp_rmb() blk-throttle: Trim/adjust slice_end once a bio has been dispatched block: check for proper length of iov entries earlier in blk_rq_map_user_iov() drbd: fix for spin_lock_irqsave in endio callback drbd: don't recvmsg with zero length
2010-12-17block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits insteadMartin K. Petersen
When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-16scsi: replace sr_test_unit_ready() with scsi_test_unit_ready()Tejun Heo
The usage of TUR has been confusing involving several different commits updating different parts over time. Currently, the only differences between scsi_test_unit_ready() and sr_test_unit_ready() are, * scsi_test_unit_ready() also sets sdev->changed on NOT_READY. * scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or NOT_READY. Due to the above two differences, sr is using its own sr_test_unit_ready(), but sd - the sole user of the above extra handling - doesn't even need them. Where scsi_test_unit_ready() is used in sd_media_changed(), the code is looking for device ready w/ media present state which is true iff TUR succeeds w/o sense data or UA, and when the device is not ready for whatever reason sd_media_changed() explicitly marks media as missing so there's no reason to set sdev->changed automatically from scsi_test_unit_ready() on NOT_READY. Drop both special handlings from scsi_test_unit_ready(), which makes it equivalant to sr_test_unit_ready(), and replace sr_test_unit_ready() with scsi_test_unit_ready(). Also, drop the unnecessary explicit NOT_READY check from sd_media_changed(). Checking return value is enough for testing device readiness. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-09[SCSI] Eliminate error handler overload of the SCSI serial numberJames Bottomley
The error handler is using the test cmd->serial_number == 0 in the abort routines to signal that the command to be aborted has already completed normally. This design was to close a race window in the original error handler where a command could go through the normal completion routines after it timed out but before error handling was started. Mike Anderson pointed out that when we converted our timeout and softirq completions, we picked up atomicity here because the block layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees that *either* the command times out or our done routine is called, but ensures we can't get both occurring. That makes the serial number zero check redundant and it can be removed. Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-25[SCSI] Fix regressions in scsi_internal_device_blockMike Christie
Deleting a SCSI device on a blocked fc_remote_port (before fast_io_fail_tmo fires) results in a hanging thread: STACK: 0 schedule+1108 [0x5cac48] 1 schedule_timeout+528 [0x5cb7fc] 2 wait_for_common+266 [0x5ca6be] 3 blk_execute_rq+160 [0x354054] 4 scsi_execute+324 [0x3b7ef4] 5 scsi_execute_req+162 [0x3b80ca] 6 sd_sync_cache+138 [0x3cf662] 7 sd_shutdown+138 [0x3cf91a] 8 sd_remove+112 [0x3cfe4c] 9 __device_release_driver+124 [0x3a08b8] 10 device_release_driver+60 [0x3a0a5c] 11 bus_remove_device+266 [0x39fa76] 12 device_del+340 [0x39d818] 13 __scsi_remove_device+204 [0x3bcc48] 14 scsi_remove_device+66 [0x3bcc8e] 15 sysfs_schedule_callback_work+50 [0x260d66] 16 worker_thread+622 [0x162326] 17 kthread+160 [0x1680b0] 18 kernel_thread_starter+6 [0x10aaea] During the delete, the SCSI device is in moved to SDEV_CANCEL. When the FC transport class later calls scsi_target_unblock, this has no effect, since scsi_internal_device_unblock ignores SCSI devics in this state. It looks like all these are regressions caused by: 5c10e63c943b4c67561ddc6bf61e01d4141f881f [SCSI] limit state transitions in scsi_internal_device_unblock Fix by rejecting offline and cancel in the state transition. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> [jejb: Original patch by Christof Schmitt, modified by Mike Christie] Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-22Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits) cfq-iosched: Fix a gcc 4.5 warning and put some comments block: Turn bvec_k{un,}map_irq() into static inline functions block: fix accounting bug on cross partition merges block: Make the integrity mapped property a bio flag block: Fix double free in blk_integrity_unregister block: Ensure physical block size is unsigned int blkio-throttle: Fix possible multiplication overflow in iops calculations blkio-throttle: limit max iops value to UINT_MAX blkio-throttle: There is no need to convert jiffies to milli seconds blkio-throttle: Fix link failure failure on i386 blkio: Recalculate the throttled bio dispatch time upon throttle limit change blkio: Add root group to td->tg_list blkio: deletion of a cgroup was causes oops blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n block: set the bounce_pfn to the actual DMA limit rather than to max memory block: revert bad fix for memory hotplug causing bounces Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK block: set the bounce_pfn to the actual DMA limit rather than to max memory block: Prevent hang_check firing during long I/O cfq: improve fsync performance for small files ... Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
2010-09-10block/scsi: Provide a limit on the number of integrity segmentsMartin K. Petersen
Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle. Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware. Add support for honoring the integrity segment limit when merging both bios and requests. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2010-09-09[SCSI] fix use-after-free in scsi_init_io()James Bottomley
we're using a pointer through a freed command to reset the request, which has shown up as an oops with slab poisoning: Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-08-11scsi: remove superfluous NULL pointer check from scsi_kill_request()Bartlomiej Zolnierkiewicz
Dan's list included: drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced in initializer 'cmd' drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced before check 'cmd' We dereference cmd (and possible OOPS if cmd == NULL) before starting the request so just remove the superfluous debugging code altogether. [ bart: the potential NULL pointer dereference was finally fixed in (much later than mine) commit 03b1470 but my patch is still valid ] Reported-by: Dan Carpenter <error27@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eugene Teo <eteo@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-07scsi: fix discard page leakFUJITA Tomonori
We leak a page allocated for discard on some error conditions (e.g. scsi_prep_state_check returns BLKPREP_DEFER in scsi_setup_blk_pc_cmnd). We unprep on requests that weren't prepped in the error path of scsi_init_io. It makes the error path to clean up scsi commands messy. Let's strictly apply the rule that we can't unprep on a request that wasn't prepped. Calling just scsi_put_command() in the error path of scsi_init_io() is enough. We don't set REQ_DONTPREP yet. scsi_setup_discard_cmnd can safely free a page on the error case with the above rule. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: implement an unprep function corresponding directly to prepJames Bottomley
Reviewed-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: remove wrappers for request type/flagsChristoph Hellwig
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-03-01Merge branch 'for-2.6.34' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block: (38 commits) block: don't access jiffies when initialising io_context cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds block: fix for "Consolidate phys_segment and hw_segment limits" cfq-iosched: quantum check tweak blktrace: perform cleanup after setup error blkdev: fix merge_bvec_fn return value checks cfq-iosched: requests "in flight" vs "in driver" clarification cciss: Fix problem with scatter gather elements in the scsi half of the driver cciss: eliminate unnecessary pointer use in cciss scsi code cciss: do not use void pointer for scsi hba data cciss: factor out scatter gather chain block mapping code cciss: fix scatter gather chain block dma direction kludge cciss: simplify scatter gather code cciss: factor out scatter gather chain block allocation and freeing cciss: detect bad alignment of scsi commands at build time cciss: clarify command list padding calculation cfq-iosched: rethink seeky detection for SSDs cfq-iosched: rework seeky detection block: remove padding from io_context on 64bit builds block: Consolidate phys_segment and hw_segment limits ...
2010-02-26block: Consolidate phys_segment and hw_segment limitsMartin K. Petersen
Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-26block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectorsMartin K. Petersen
The block layer calling convention is blk_queue_<limit name>. blk_queue_max_sectors predates this practice, leading to some confusion. Rename the function to appropriately reflect that its intended use is to set max_hw_sectors. Also introduce a temporary wrapper for backwards compability. This can be removed after the merge window is closed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-18[SCSI] skip sense logging for some ATA PASS-THROUGH cdbsDouglas Gilbert
Further to the lsml thread titled: "does scsi_io_completion need to dump sense data for ata pass through (ck_cond = 1) ?" This is a patch to skip logging when the sense data is associated with a SENSE_KEY of "RECOVERED_ERROR" and the additional sense code is "ATA PASS-THROUGH INFORMATION AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH commands when CK_COND=1 (in the cdb). It indicates that the sense data contains ATA registers. Smartmontools uses such commands on ATA disks connected via SAT. Periodic checks such as those done by smartd cause nuisance entries into logs that are: - neither errors nor warnings - pointless unless the cdb that caused them are also logged Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-01-17[SCSI] scsi_lib: Fix bug in completion of bidi commandsBoaz Harrosh
Because of the terrible structuring of scsi-bidi-commands it breaks some of the life time rules of a scsi-command. It is now not allowed to free up the block-request before cleanup and partial deallocation of the scsi-command. (Which is not so for none bidi commands) The right fix to this problem would be to make bidi command a first citizen by allocating a scsi_sdb pointer at scsi command just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated as part of the get/put_command (Again like the prot_sdb) and the current decoupling of scsi_cmnd and blk-request should be kept. For now make sure scsi_release_buffers() is called before the call to blk_end_request_all() which might cause the suicide of the block requests. At best the leak of bidi buffers, at worse a crash, as there is a race between the existence of the bidi_request and the free of the associated bidi_sdb. The reason this was never hit before is because only OSD has the potential of doing asynchronous bidi commands. (So does bsg but it is never used) And OSD clients just happen to do all their bidi commands synchronously, up until recently. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-10[SCSI] Correctly handle thin provisioning write errorMartin K. Petersen
A thin provisioned device may temporarily be out of sufficient allocation units to fulfill a write request. In that case it will return a space allocation in progress error. Wait a bit and retry the write. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-04[SCSI] scsi_lib: fix potential NULL dereferenceJiri Slaby
Stanse found a potential NULL dereference in scsi_kill_request. Instead of triggering BUG() in 'if (unlikely(cmd == NULL))' branch, the kernel will Oops earlier on cmd dereference. Move the dereferences after the if. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-29[SCSI] fix propogation of integrity errorsMike Christie
When the Integrity check is done in scsi_io_completion it will set error to -EILSEQ. However, at this point error is no longer used, and blk_end_request_err has -EIO hardcoded. It looks like there was just porting mistake with this patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3e695f89c5debb735e4ff051e9e58d8fb4e95110 and we meant to send error upwards, so this patch changes the hard coded EIO to the error variable. I have only boot tested this patch. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-14Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...
2009-09-11scsi,block: update SCSI to handle mixed merge failuresTejun Heo
Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to merge requests with different failfast settings and still behave correctly on errors. Allow merge of requests of different failfast settings. As SCSI is currently the only subsystem which follows failfast status, there's no need to worry about other block drivers for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-22[SCSI] Print failed commandsMartin K. Petersen
When a request fails we print the sense data but not the actual command that failed. Add a printout of the operation + CDB for failed commands. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-06-21sd, sr: fix Driver 'sd' needs updating messageHannes Reinecke
If a SCSI ULD driver sets blk_queue_prep_rq(), it should clean it up itself on remove(), and not from the bus callbacks. This removes the need to hook into bus->remove(), which should not be used at the same time as driver->remove(). [jejb: fix sdkp initialisation problem due to mismerge] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-12[SCSI] Merge branch 'linus'James Bottomley
Conflicts: drivers/message/fusion/mptsas.c fixed up conflict between req->data_len accessors and mptsas driver updates. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-23[SCSI] limit state transitions in scsi_internal_device_unblockTakahiro Yasui
scsi timeout on two or more devices may cause extremely long execution time for user applications because SDEV_OFFLINE state is changed to SDEV_RUNNING state during scsi error recovery procedures triggered by a bus reset or a host reset of scsi LLD, and scsi timeout can happens on the same devices many times. This happens because scsi_internal_device_unblock() changes device's state to SDEV_RUNNING even if a device in other states than SDEV_BLOCK, while the following two transitions are required in this function. SDEV_BLOCK -> SDEV_RUNNING SDEV_CREATED_BLOCK -> SDEV_CREATED Otherwise, it returns -EINVAL. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> [matthew@wil.cx: supplied rewritten base for patch] Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-22Merge branch 'master' into for-2.6.31Jens Axboe
Conflicts: drivers/block/hd.c drivers/block/mg_disk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-19scsi_lib: remove unused variableBoaz Harrosh
The last request completion cleanup in scsi_lib left an unused this_count variable in scsi_io_completion(). (It was used before in a code segment that now uses blk_end_request_all()) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-12scsi: fix resid_len mis-conversion in scsi_end_request()Tejun Heo
Commit c3a4d78c580de4edc9ef0f7c59812fb02ceb037f introduced rq->data_len and converted residual count users to it. While converting, it mistakenly converted scsi_end_request() to finish requests with residual count when it wants to do is fully complete the request. Fix it by using blk_end_request_all() instead. This bug was spotted by Boaz Harrosh. Signed-off-by: Tejun Heo <tj@kernel.org> Spotted-by: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11scsi: simplify the bidi completionFUJITA Tomonori
Let's use blk_end_request_all() instead of blk_end_bidi_request(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11block: implement and enforce request peek/start/fetchTejun Heo
Till now block layer allowed two separate modes of request execution. A request is always acquired from the request queue via elv_next_request(). After that, drivers are free to either dequeue it or process it without dequeueing. Dequeue allows elv_next_request() to return the next request so that multiple requests can be in flight. Executing requests without dequeueing has its merits mostly in allowing drivers for simpler devices which can't do sg to deal with segments only without considering request boundary. However, the benefit this brings is dubious and declining while the cost of the API ambiguity is increasing. Segment based drivers are usually for very old or limited devices and as converting to dequeueing model isn't difficult, it doesn't justify the API overhead it puts on block layer and its more modern users. Previous patches converted all block low level drivers to dequeueing model. This patch completes the API transition by... * renaming elv_next_request() to blk_peek_request() * renaming blkdev_dequeue_request() to blk_start_request() * adding blk_fetch_request() which is combination of peek and start * disallowing completion of queued (not started) requests * applying new API to all LLDs Renamings are for consistency and to break out of tree code so that it's apparent that out of tree drivers need updating. [ Impact: block request issue API cleanup, no functional change ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Miller <mike.miller@hp.com> Cc: unsik Kim <donari75@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Laurent Vivier <Laurent@lvivier.info> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11block: blk_rq_[cur_]_{sectors|bytes}() usage cleanupTejun Heo
With the previous changes, the followings are now guaranteed for all requests in any valid state. * blk_rq_sectors() == blk_rq_bytes() >> 9 * blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9 Clean up accessor usages. Notable changes are * nbd,i2o_block: end_all used instead of explicit byte count * scsi_lib: unnecessary conditional on request type removed [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11block: cleanup rq->data_len usagesTejun Heo
With recent unification of fields, it's now guaranteed that rq->data_len always equals blk_rq_bytes(). Convert all non-IDE direct users to accessors. IDE will be converted in a separate patch. Boaz: spotted incorrect data_len/resid_len conversion in osd. [ Impact: convert direct rq->data_len usages to blk_rq_bytes() ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11block: convert to pos and nr_sectors accessorsTejun Heo
With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>