aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-10block: delete part_round_stats and switch to less precise countingMikulas Patocka
We want to convert to per-cpu in_flight counters. The function part_round_stats needs the in_flight counter every jiffy, it would be too costly to sum all the percpu variables every jiffy, so it must be deleted. part_round_stats is used to calculate two counters - time_in_queue and io_ticks. time_in_queue can be calculated without part_round_stats, by adding the duration of the I/O when the I/O ends (the value is almost as exact as the previously calculated value, except that time for in-progress I/Os is not counted). io_ticks can be approximated by increasing the value when I/O is started or ended and the jiffies value has changed. If the I/Os take less than a jiffy, the value is as exact as the previously calculated value. If the I/Os take more than a jiffy, io_ticks can drift behind the previously calculated value. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10block: stop passing 'cpu' to all percpu stats methodsMike Snitzer
All of part_stat_* and related methods are used with preempt disabled, so there is no need to pass cpu around to allow of them. Just call smp_processor_id() as needed. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10dm rq: leverage blk_mq_queue_busy() to check for outstanding IOMike Snitzer
Now that request-based dm-multipath only supports blk-mq, make use of the newly introduced blk_mq_queue_busy() to check for outstanding IO -- rather than (ab)using the block core's in_flight counters. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10dm: dont rewrite dm_disk(md)->part0.in_flightMikulas Patocka
generic_start_io_acct and generic_end_io_acct already update the variable in_flight using atomic operations, so we don't have to overwrite them again. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-09Merge tag 'v4.20-rc6' into for-4.21/blockJens Axboe
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the two corruption fixes. We're going to be overhauling the direct dispatch path, and we need to do that on top of the changes we made for that in mainline. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-09sbitmap: silence bogus lockdep IRQ warningJens Axboe
Ming reports that lockdep spews the following trace. What this essentially says is that the sbitmap swap_lock was used inconsistently in IRQ enabled and disabled context, and that is usually indicative of a bug that will cause a deadlock. For this case, it's a false positive. The swap_lock is used from process context only, when we swap the bits in the word and cleared mask. We also end up doing that when we are getting a driver tag, from the blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with IRQs disabled. However, this isn't from an actual IRQ, it's still process context. In lieu of a better way to fix this, simply always disable interrupts when grabbing the swap_lock if lockdep is enabled. [ 100.967642] ================start test sanity/001================ [ 101.238280] null: module loaded [ 106.093735] [ 106.094012] ===================================================== [ 106.094854] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 106.095759] 4.20.0-rc3_5d2ee7122c73_for-next+ #1 Not tainted [ 106.096551] ----------------------------------------------------- [ 106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 106.098231] 000000004c43fa71 (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c [ 106.099431] [ 106.099431] and this task is already holding: [ 106.100229] 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.101630] which would create a new lock dependency: [ 106.102326] (&(&hctx->dispatch_wait_lock)->rlock){....} -> (&(&sb->map[i].swap_lock)->rlock){+.+.} [ 106.103553] [ 106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 106.104580] (&sbq->ws[i].wait){..-.} [ 106.104582] [ 106.104582] ... which became SOFTIRQ-irq-safe at: [ 106.105751] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.106284] __wake_up_common_lock+0x119/0x1b9 [ 106.106825] sbitmap_queue_wake_up+0x33f/0x383 [ 106.107456] sbitmap_queue_clear+0x4c/0x9a [ 106.108046] __blk_mq_free_request+0x188/0x1d3 [ 106.108581] blk_mq_free_request+0x23b/0x26b [ 106.109102] scsi_end_request+0x345/0x5d7 [ 106.109587] scsi_io_completion+0x4b5/0x8f0 [ 106.110099] scsi_finish_command+0x412/0x456 [ 106.110615] scsi_softirq_done+0x23f/0x29b [ 106.111115] blk_done_softirq+0x2a7/0x2e6 [ 106.111608] __do_softirq+0x360/0x6ad [ 106.112062] run_ksoftirqd+0x2f/0x5b [ 106.112499] smpboot_thread_fn+0x3a5/0x3db [ 106.113000] kthread+0x1d4/0x1e4 [ 106.113457] ret_from_fork+0x3a/0x50 [ 106.113969] [ 106.113969] to a SOFTIRQ-irq-unsafe lock: [ 106.114672] (&(&sb->map[i].swap_lock)->rlock){+.+.} [ 106.114674] [ 106.114674] ... which became SOFTIRQ-irq-unsafe at: [ 106.116000] ... [ 106.116003] _raw_spin_lock+0x33/0x64 [ 106.116676] sbitmap_get+0xd5/0x22c [ 106.117134] __sbitmap_queue_get+0xe8/0x177 [ 106.117731] __blk_mq_get_tag+0x1e6/0x22d [ 106.118286] blk_mq_get_tag+0x1db/0x6e4 [ 106.118756] blk_mq_get_driver_tag+0x161/0x258 [ 106.119383] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.120043] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.120607] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.121234] __blk_mq_run_hw_queue+0x137/0x17e [ 106.121781] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.122366] blk_mq_run_hw_queue+0x151/0x187 [ 106.122887] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.123492] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.124042] blk_flush_plug_list+0x392/0x3d7 [ 106.124557] blk_finish_plug+0x37/0x4f [ 106.125019] read_pages+0x3ef/0x430 [ 106.125446] __do_page_cache_readahead+0x18e/0x2fc [ 106.126027] force_page_cache_readahead+0x121/0x133 [ 106.126621] page_cache_sync_readahead+0x35f/0x3bb [ 106.127229] generic_file_buffered_read+0x410/0x1860 [ 106.127932] __vfs_read+0x319/0x38f [ 106.128415] vfs_read+0xd2/0x19a [ 106.128817] ksys_read+0xb9/0x135 [ 106.129225] do_syscall_64+0x140/0x385 [ 106.129684] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.130292] [ 106.130292] other info that might help us debug this: [ 106.130292] [ 106.131226] Chain exists of: [ 106.131226] &sbq->ws[i].wait --> &(&hctx->dispatch_wait_lock)->rlock --> &(&sb->map[i].swap_lock)->rlock [ 106.131226] [ 106.132865] Possible interrupt unsafe locking scenario: [ 106.132865] [ 106.133659] CPU0 CPU1 [ 106.134194] ---- ---- [ 106.134733] lock(&(&sb->map[i].swap_lock)->rlock); [ 106.135318] local_irq_disable(); [ 106.136014] lock(&sbq->ws[i].wait); [ 106.136747] lock(&(&hctx->dispatch_wait_lock)->rlock); [ 106.137742] <Interrupt> [ 106.138110] lock(&sbq->ws[i].wait); [ 106.138625] [ 106.138625] *** DEADLOCK *** [ 106.138625] [ 106.139430] 3 locks held by fio/1043: [ 106.139947] #0: 0000000076ff0fd9 (rcu_read_lock){....}, at: hctx_lock+0x29/0xe8 [ 106.140813] #1: 000000002feb1016 (&sbq->ws[i].wait){..-.}, at: blk_mq_dispatch_rq_list+0x4ad/0xd7c [ 106.141877] #2: 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.143267] [ 106.143267] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 106.144351] -> (&sbq->ws[i].wait){..-.} ops: 82 { [ 106.144926] IN-SOFTIRQ-W at: [ 106.145314] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.146042] __wake_up_common_lock+0x119/0x1b9 [ 106.146785] sbitmap_queue_wake_up+0x33f/0x383 [ 106.147567] sbitmap_queue_clear+0x4c/0x9a [ 106.148379] __blk_mq_free_request+0x188/0x1d3 [ 106.149148] blk_mq_free_request+0x23b/0x26b [ 106.149864] scsi_end_request+0x345/0x5d7 [ 106.150546] scsi_io_completion+0x4b5/0x8f0 [ 106.151367] scsi_finish_command+0x412/0x456 [ 106.152157] scsi_softirq_done+0x23f/0x29b [ 106.152855] blk_done_softirq+0x2a7/0x2e6 [ 106.153537] __do_softirq+0x360/0x6ad [ 106.154280] run_ksoftirqd+0x2f/0x5b [ 106.155020] smpboot_thread_fn+0x3a5/0x3db [ 106.155828] kthread+0x1d4/0x1e4 [ 106.156526] ret_from_fork+0x3a/0x50 [ 106.157267] INITIAL USE at: [ 106.157713] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.158542] prepare_to_wait_exclusive+0xa8/0x215 [ 106.159421] blk_mq_get_tag+0x34f/0x6e4 [ 106.160186] blk_mq_get_request+0x48e/0xaef [ 106.160997] blk_mq_make_request+0x27e/0xbd2 [ 106.161828] generic_make_request+0x4d1/0x873 [ 106.162661] submit_bio+0x20c/0x253 [ 106.163379] mpage_bio_submit+0x44/0x4b [ 106.164142] mpage_readpages+0x3c2/0x407 [ 106.164919] read_pages+0x13a/0x430 [ 106.165633] __do_page_cache_readahead+0x18e/0x2fc [ 106.166530] force_page_cache_readahead+0x121/0x133 [ 106.167439] page_cache_sync_readahead+0x35f/0x3bb [ 106.168337] generic_file_buffered_read+0x410/0x1860 [ 106.169255] __vfs_read+0x319/0x38f [ 106.169977] vfs_read+0xd2/0x19a [ 106.170662] ksys_read+0xb9/0x135 [ 106.171356] do_syscall_64+0x140/0x385 [ 106.172120] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.173051] } [ 106.173308] ... key at: [<ffffffff85094600>] __key.26481+0x0/0x40 [ 106.174219] ... acquired at: [ 106.174646] _raw_spin_lock+0x33/0x64 [ 106.175183] blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.175843] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.176518] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.177262] __blk_mq_run_hw_queue+0x137/0x17e [ 106.177900] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.178591] blk_mq_run_hw_queue+0x151/0x187 [ 106.179207] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.179926] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.180571] blk_flush_plug_list+0x392/0x3d7 [ 106.181187] blk_finish_plug+0x37/0x4f [ 106.181737] __se_sys_io_submit+0x171/0x304 [ 106.182346] do_syscall_64+0x140/0x385 [ 106.182895] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.183607] [ 106.183830] -> (&(&hctx->dispatch_wait_lock)->rlock){....} ops: 1 { [ 106.184691] INITIAL USE at: [ 106.185119] _raw_spin_lock+0x33/0x64 [ 106.185838] blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.186697] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.187551] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.188481] __blk_mq_run_hw_queue+0x137/0x17e [ 106.189307] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.190189] blk_mq_run_hw_queue+0x151/0x187 [ 106.190989] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.191902] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.192739] blk_flush_plug_list+0x392/0x3d7 [ 106.193535] blk_finish_plug+0x37/0x4f [ 106.194269] __se_sys_io_submit+0x171/0x304 [ 106.195059] do_syscall_64+0x140/0x385 [ 106.195794] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.196705] } [ 106.196950] ... key at: [<ffffffff84880620>] __key.51231+0x0/0x40 [ 106.197853] ... acquired at: [ 106.198270] lock_acquire+0x280/0x2f3 [ 106.198806] _raw_spin_lock+0x33/0x64 [ 106.199337] sbitmap_get+0xd5/0x22c [ 106.199850] __sbitmap_queue_get+0xe8/0x177 [ 106.200450] __blk_mq_get_tag+0x1e6/0x22d [ 106.201035] blk_mq_get_tag+0x1db/0x6e4 [ 106.201589] blk_mq_get_driver_tag+0x161/0x258 [ 106.202237] blk_mq_dispatch_rq_list+0x5b9/0xd7c [ 106.202902] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.203572] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.204316] __blk_mq_run_hw_queue+0x137/0x17e [ 106.204956] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.205649] blk_mq_run_hw_queue+0x151/0x187 [ 106.206269] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.206997] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.207644] blk_flush_plug_list+0x392/0x3d7 [ 106.208264] blk_finish_plug+0x37/0x4f [ 106.208814] __se_sys_io_submit+0x171/0x304 [ 106.209415] do_syscall_64+0x140/0x385 [ 106.209965] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.210684] [ 106.210904] [ 106.210904] the dependencies between the lock to be acquired [ 106.210905] and SOFTIRQ-irq-unsafe lock: [ 106.212541] -> (&(&sb->map[i].swap_lock)->rlock){+.+.} ops: 1969 { [ 106.213393] HARDIRQ-ON-W at: [ 106.213840] _raw_spin_lock+0x33/0x64 [ 106.214570] sbitmap_get+0xd5/0x22c [ 106.215282] __sbitmap_queue_get+0xe8/0x177 [ 106.216086] __blk_mq_get_tag+0x1e6/0x22d [ 106.216876] blk_mq_get_tag+0x1db/0x6e4 [ 106.217627] blk_mq_get_driver_tag+0x161/0x258 [ 106.218465] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.219326] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.220198] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.221138] __blk_mq_run_hw_queue+0x137/0x17e [ 106.221975] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.222874] blk_mq_run_hw_queue+0x151/0x187 [ 106.223686] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.224597] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.225444] blk_flush_plug_list+0x392/0x3d7 [ 106.226255] blk_finish_plug+0x37/0x4f [ 106.227006] read_pages+0x3ef/0x430 [ 106.227717] __do_page_cache_readahead+0x18e/0x2fc [ 106.228595] force_page_cache_readahead+0x121/0x133 [ 106.229491] page_cache_sync_readahead+0x35f/0x3bb [ 106.230373] generic_file_buffered_read+0x410/0x1860 [ 106.231277] __vfs_read+0x319/0x38f [ 106.231986] vfs_read+0xd2/0x19a [ 106.232666] ksys_read+0xb9/0x135 [ 106.233350] do_syscall_64+0x140/0x385 [ 106.234097] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.235012] SOFTIRQ-ON-W at: [ 106.235460] _raw_spin_lock+0x33/0x64 [ 106.236195] sbitmap_get+0xd5/0x22c [ 106.236913] __sbitmap_queue_get+0xe8/0x177 [ 106.237715] __blk_mq_get_tag+0x1e6/0x22d [ 106.238488] blk_mq_get_tag+0x1db/0x6e4 [ 106.239244] blk_mq_get_driver_tag+0x161/0x258 [ 106.240079] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.240937] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.241806] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.242751] __blk_mq_run_hw_queue+0x137/0x17e [ 106.243579] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.244469] blk_mq_run_hw_queue+0x151/0x187 [ 106.245277] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.246191] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.247044] blk_flush_plug_list+0x392/0x3d7 [ 106.247859] blk_finish_plug+0x37/0x4f [ 106.248749] read_pages+0x3ef/0x430 [ 106.249463] __do_page_cache_readahead+0x18e/0x2fc [ 106.250357] force_page_cache_readahead+0x121/0x133 [ 106.251263] page_cache_sync_readahead+0x35f/0x3bb [ 106.252157] generic_file_buffered_read+0x410/0x1860 [ 106.253084] __vfs_read+0x319/0x38f [ 106.253808] vfs_read+0xd2/0x19a [ 106.254488] ksys_read+0xb9/0x135 [ 106.255186] do_syscall_64+0x140/0x385 [ 106.255943] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.256867] INITIAL USE at: [ 106.257300] _raw_spin_lock+0x33/0x64 [ 106.258033] sbitmap_get+0xd5/0x22c [ 106.258747] __sbitmap_queue_get+0xe8/0x177 [ 106.259542] __blk_mq_get_tag+0x1e6/0x22d [ 106.260320] blk_mq_get_tag+0x1db/0x6e4 [ 106.261072] blk_mq_get_driver_tag+0x161/0x258 [ 106.261902] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.262762] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.263626] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.264571] __blk_mq_run_hw_queue+0x137/0x17e [ 106.265409] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.266302] blk_mq_run_hw_queue+0x151/0x187 [ 106.267111] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.268028] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.268878] blk_flush_plug_list+0x392/0x3d7 [ 106.269694] blk_finish_plug+0x37/0x4f [ 106.270432] read_pages+0x3ef/0x430 [ 106.271139] __do_page_cache_readahead+0x18e/0x2fc [ 106.272040] force_page_cache_readahead+0x121/0x133 [ 106.272932] page_cache_sync_readahead+0x35f/0x3bb [ 106.273811] generic_file_buffered_read+0x410/0x1860 [ 106.274709] __vfs_read+0x319/0x38f [ 106.275407] vfs_read+0xd2/0x19a [ 106.276074] ksys_read+0xb9/0x135 [ 106.276764] do_syscall_64+0x140/0x385 [ 106.277500] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.278417] } [ 106.278676] ... key at: [<ffffffff85094640>] __key.26212+0x0/0x40 [ 106.279586] ... acquired at: [ 106.280026] lock_acquire+0x280/0x2f3 [ 106.280559] _raw_spin_lock+0x33/0x64 [ 106.281101] sbitmap_get+0xd5/0x22c [ 106.281610] __sbitmap_queue_get+0xe8/0x177 [ 106.282221] __blk_mq_get_tag+0x1e6/0x22d [ 106.282809] blk_mq_get_tag+0x1db/0x6e4 [ 106.283368] blk_mq_get_driver_tag+0x161/0x258 [ 106.284018] blk_mq_dispatch_rq_list+0x5b9/0xd7c [ 106.284685] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.285371] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.286135] __blk_mq_run_hw_queue+0x137/0x17e [ 106.286806] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.287515] blk_mq_run_hw_queue+0x151/0x187 [ 106.288149] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.289041] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.289912] blk_flush_plug_list+0x392/0x3d7 [ 106.290590] blk_finish_plug+0x37/0x4f [ 106.291238] __se_sys_io_submit+0x171/0x304 [ 106.291864] do_syscall_64+0x140/0x385 [ 106.292534] entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-09Linux 4.20-rc6Linus Torvalds
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A decent batch of fixes here. I'd say about half are for problems that have existed for a while, and half are for new regressions added in the 4.20 merge window. 1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach. 2) Revert bogus emac driver change, from Benjamin Herrenschmidt. 3) Handle BPF exported data structure with pointers when building 32-bit userland, from Daniel Borkmann. 4) Memory leak fix in act_police, from Davide Caratti. 5) Check RX checksum offload in RX descriptors properly in aquantia driver, from Dmitry Bogdanov. 6) SKB unlink fix in various spots, from Edward Cree. 7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from Eric Dumazet. 8) Fix FID leak in mlxsw driver, from Ido Schimmel. 9) IOTLB locking fix in vhost, from Jean-Philippe Brucker. 10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory limits otherwise namespace exit can hang. From Jiri Wiesner. 11) Address block parsing length fixes in x25 from Martin Schiller. 12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan. 13) For tun interfaces, only iface delete works with rtnl ops, enforce this by disallowing add. From Nicolas Dichtel. 14) Use after free in liquidio, from Pan Bian. 15) Fix SKB use after passing to netif_receive_skb(), from Prashant Bhole. 16) Static key accounting and other fixes in XPS from Sabrina Dubroca. 17) Partially initialized flow key passed to ip6_route_output(), from Shmulik Ladkani. 18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas Falcon. 19) Several small TCP fixes (off-by-one on window probe abort, NULL deref in tail loss probe, SNMP mis-estimations) from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) net/sched: cls_flower: Reject duplicated rules also under skip_sw bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. bnxt_en: Keep track of reserved IRQs. bnxt_en: Fix CNP CoS queue regression. net/mlx4_core: Correctly set PFC param if global pause is turned off. Revert "net/ibm/emac: wrong bit is used for STA control" neighbour: Avoid writing before skb->head in neigh_hh_output() ipv6: Check available headroom in ip6_xmit() even without options tcp: lack of available data can also cause TSO defer ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl mlxsw: spectrum_router: Relax GRE decap matching check mlxsw: spectrum_switchdev: Avoid leaking FID's reference count mlxsw: spectrum_nve: Remove easily triggerable warnings ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes sctp: frag_point sanity check tcp: fix NULL ref in tail loss probe tcp: Do not underestimate rwnd_limited net: use skb_list_del_init() to remove from RX sublists ...
2018-12-09Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact fix and an LLVM workaround" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Drop implicit common-page-size linker flag x86/build: Fix compiler support check for CONFIG_RETPOLINE x86/boot: Clear RSDP address in boot_params for broken loaders
2018-12-09Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kprobes fixes from Ingo Molnar: "Two kprobes fixes: a blacklist fix and an instruction patching related corruption fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Blacklist non-attachable interrupt functions kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
2018-12-09Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Two fixes: a large-system fix and an earlyprintk fix with certain resolutions" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/earlyprintk/efi: Fix infinite loop on some screen widths x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
2018-12-09net/sched: cls_flower: Reject duplicated rules also under skip_swOr Gerlitz
Currently, duplicated rules are rejected only for skip_hw or "none", hence allowing users to push duplicates into HW for no reason. Use the flower tables to protect for that. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reported-by: Chris Mi <chrism@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge branch 'bnxt_en-Bug-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: Bug fixes. The first patch fixes a regression on CoS queue setup, introduced recently by the 57500 new chip support patches. The rest are fixes related to ring and resource accounting on the new 57500 chips. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.Michael Chan
The CP rings are accounted differently on the new 57500 chips. There must be enough CP rings for the sum of RX and TX rings on the new chips. The current logic may be over-estimating the RX and TX rings. The output parameter max_cp should be the maximum NQs capped by MSIX vectors available for networking in the context of 57500 chips. The existing code which uses CMPL rings capped by the MSIX vectors works most of the time but is not always correct. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.Michael Chan
The new 57500 chips have introduced the NQ structure in addition to the existing CP rings in all chips. We need to introduce a new bnxt_nq_rings_in_use(). On legacy chips, the 2 functions are the same and one will just call the other. On the new chips, they refer to the 2 separate ring structures. The new function is now called to determine the resource (NQ or CP rings) associated with MSIX that are in use. On 57500 chips, the RDMA driver does not use the CP rings so we don't need to do the subtraction adjustment. Fixes: 41e8d7983752 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09bnxt_en: Keep track of reserved IRQs.Michael Chan
The new 57500 chips use 1 NQ per MSIX vector, whereas legacy chips use 1 CP ring per MSIX vector. To better unify this, add a resv_irqs field to struct bnxt_hw_resc. On legacy chips, we initialize resv_irqs with resv_cp_rings. On new chips, we initialize it with the allocated MSIX resources. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09bnxt_en: Fix CNP CoS queue regression.Michael Chan
Recent changes to support the 57500 devices have created this regression. The bnxt_hwrm_queue_qportcfg() call was moved to be called earlier before the RDMA support was determined, causing the CoS queues configuration to be set before knowing whether RDMA was supported or not. Fix it by moving it to the right place right after RDMA support is determined. Fixes: 98f04cf0f1fc ("bnxt_en: Check context memory requirements from firmware.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge tag 'char-misc-4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 4.20-rc6. There is a hyperv fix that for some reaon took forever to get into a shape that could be applied to the tree properly, but resolves a much reported issue. The others are some gnss patches, one a bugfix and the two others updates to the MAINTAINERS file to properly match the gnss files in the tree. All have been in linux-next for a while with no reported issues" * tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching MAINTAINERS: add gnss scm tree gnss: sirf: fix activation retry handling Drivers: hv: vmbus: Offload the handling of channels to two workqueues
2018-12-09Merge tag 'staging-4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging fixes from Greg KH: "Here are two staging driver bugfixes for 4.20-rc6. One is a revert of a previously incorrect patch that was merged a while ago, and the other resolves a possible buffer overrun that was found by code inspection. Both of these have been in the linux-next tree with no reported issues" * tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: Revert commit ef9209b642f "staging: rtl8723bs: Fix indenting errors and an off-by-one mistake in core/rtw_mlme_ext.c" staging: rtl8712: Fix possible buffer overrun
2018-12-09Merge tag 'tty-4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty driver fixes from Greg KH: "Here are three small tty driver fixes for 4.20-rc6 Nothing major, just some bug fixes for reported issues. Full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var() tty: serial: 8250_mtk: always resume the device in probe. tty: do not set TTY_IO_ERROR flag if console port
2018-12-09Merge tag 'usb-4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for 4.20-rc6 The "largest" here are some xhci fixes for reported issues. Also here is a USB core fix, some quirk additions, and a usb-serial fix which required the export of one of the tty layer's functions to prevent code duplication. The tty maintainer agreed with this change. All of these have been in linux-next with no reported issues" * tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: xhci: Prevent U1/U2 link pm states if exit latency is too long xhci: workaround CSS timeout on AMD SNPS 3.0 xHC USB: check usb_get_extra_descriptor for proper size USB: serial: console: fix reported terminal settings usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device USB: Fix invalid-free bug in port_over_current_notify() usb: appledisplay: Add 27" Apple Cinema Display
2018-12-09Merge tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small fixes: a fix for smb3 direct i/o, a fix for CIFS DFS for stable and a minor cifs Kconfig fix" * tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Avoid returning EBUSY to upper layer VFS cifs: Fix separator when building path from dentry cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
2018-12-09Merge tag 'dax-fixes-4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "The last of the known regression fixes and fallout from the Xarray conversion of the filesystem-dax implementation. On the path to debugging why the dax memory-failure injection test started failing after the Xarray conversion a couple more fixes for the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced. Those plus the bug that started the hunt are now addressed. These patches have appeared in a -next release with no issues reported. Note the touches to mm/memory-failure.c are just the conversion to the new function signature for dax_lock_page(). Summary: - Fix the Xarray conversion of fsdax to properly handle dax_lock_mapping_entry() in the presense of pmd entries - Fix inode destruction racing a new lock request" * tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix unlock mismatch with updated API dax: Don't access a freed inode dax: Check page->mapping isn't NULL
2018-12-09Merge tag 'libnvdimm-fixes-4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A regression fix for the Address Range Scrub implementation, yes another one, and support for platforms that misalign persistent memory relative to the Linux memory hotplug section constraint. Longer term, support for sub-section memory hotplug would alleviate alignment waste, but until then this hack allows a 'struct page' memmap to be established for these misaligned memory regions. These have all appeared in a -next release, and thanks to Patrick for reporting and testing the alignment padding fix. Summary: - Unless and until the core mm handles memory hotplug units smaller than a section (128M), persistent memory namespaces must be padded to section alignment. The libnvdimm core already handled section collision with "System RAM", but some configurations overlap independent "Persistent Memory" ranges within a section, so additional padding injection is added for that case. - The recent reworks of the ARS (address range scrub) state machine to reduce the number of state flags inadvertantly missed a conversion of acpi_nfit_ars_rescan() call sites. Fix the regression whereby user-requested ARS results in a "short" scrub rather than a "long" scrub. - Fixup the unit tests to handle / test the 128M section alignment of mocked test resources. * tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short" libnvdimm, pfn: Pad pfn namespaces relative to other regions tools/testing/nvdimm: Align test resources to 128M
2018-12-08net/mlx4_core: Correctly set PFC param if global pause is turned off.Tarick Bedeir
rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1. Fixes: 6e8814ceb7e8 ("net/mlx4_en: Fix mixed PFC and Global pause user control requests") Signed-off-by: Tarick Bedeir <tarick@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-08Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal Pull thermal SoC fixes from Eduardo Valentin: "Fixes for armada and broadcom thermal drivers" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: thermal: broadcom: constify thermal_zone_of_device_ops structure thermal: armada: constify thermal_zone_of_device_ops structure thermal: bcm2835: Switch to SPDX identifier thermal: armada: fix legacy resource fixup thermal: armada: fix legacy validity test sense
2018-12-08Merge tag 'asm-generic-4.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "Multiple people reported a bug I introduced in asm-generic/unistd.h in 4.20, this is the obvious bugfix to get glibc and others to correctly build again on new architectures that no longer provide the old fstatat64() family of system calls" * tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: unistd.h: fixup broken macro include.
2018-12-08Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A few clk driver fixes this time: - Introduce protected-clock DT binding to fix breakage on qcom sdm845-mtp boards where the qspi clks introduced this merge window cause the firmware on those boards to take down the system if we try to read the clk registers - Fix a couple off-by-one errors found by Dan Carpenter - Handle failure in zynq fixed factor clk driver to avoid using uninitialized data" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: zynqmp: Off by one in zynqmp_is_valid_clock() clk: mmp: Off by one in mmp_clk_add() clk: mvebu: Off by one bugs in cp110_of_clk_get() arm64: dts: qcom: sdm845-mtp: Mark protected gcc clocks clk: qcom: Support 'protected-clocks' property dt-bindings: clk: Introduce 'protected-clocks' property clk: zynqmp: handle fixed factor param query error
2018-12-08Merge tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Here are hopefully the last set of fixes for 4.20. There's a fix for a longstanding statfs reporting problem with project quotas, a correction for page cache invalidation behaviors when fallocating near EOF, and a fix for a broken metadata verifier return code. Finally, the most important fix is to the pipe splicing code (aka the generic copy_file_range fallback) to avoid pointless short directio reads by only asking the filesystem for as much data as there are available pages in the pipe buffer. Our previous fix (simulated short directio reads because the number of pages didn't match the length of the read requested) caused subtle problems on overlayfs, so that part is reverted. Anyhow, this series passes fstests -g all on xfs and overlay+xfs, and has passed 17 billion fsx operations problem-free since I started testing Summary: - Fix broken project quota inode counts - Fix incorrect PAGE_MASK/PAGE_SIZE usage - Fix incorrect return value in btree verifier - Fix WARN_ON remap flags false positive - Fix splice read overflows" * tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) splice: don't read more than available pipe space vfs: allow some remap flags to be passed to vfs_clone_file_range xfs: fix inverted return from xfs_btree_sblock_verify_crc xfs: fix PAGE_MASK usage in xfs_free_file_space fs/xfs: fix f_ffree value for statfs when project quota is set
2018-12-08Revert "mm, thp: consolidate THP gfp handling into ↵David Rientjes
alloc_hugepage_direct_gfpmask" This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. This should have been done as part of 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"). The movement of the thp allocation policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was intended to only set __GFP_THISNODE for mempolicies that are not MPOL_BIND whereas the revert could set this regardless of mempolicy. While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() was racy, that has since been removed since the revert. What is left is the possibility to use __GFP_THISNODE in policy_node() when it is unexpected because the special handling for hugepages in alloc_pages_vma() was removed as part of the consolidation. Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat different policy for hugepage allocations, which were allocated through alloc_hugepage_vma(). For hugepage allocations, if the allocating process's node is in the set of allowed nodes, allocate with __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with __GFP_THISNODE instead). This was changed for shmem_alloc_hugepage() to allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in mm/mempolicy.c which is functionally different behavior and removes the requirement to only allocate hugepages locally. So this commit does a full revert of 89c83fb539f9 instead of the partial revert that was done in 2f0799a0ffc0. The result is the same thp allocation policy for 4.20 that was in 4.19. Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-07Revert "net/ibm/emac: wrong bit is used for STA control"Benjamin Herrenschmidt
This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9. This commit is completely bogus. The STACR register has two formats, old and new, depending on the version of the IP block used. There's a pair of device-tree properties that can be used to specify the format used: has-inverted-stacr-oc has-new-stacr-staopc What this commit did was to change the bit definition used with the old parts to match the new parts. This of course breaks the driver on all the old ones. Instead, the author should have set the appropriate properties in the device-tree for the variant used on his board. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07scsi: Fix a harmless double shift bugDan Carpenter
Smatch generates a warning: drivers/scsi/scsi_lib.c:1656 scsi_mq_done() warn: test_bit() takes a bit number The problem is that SCMD_STATE_COMPLETE is supposed to be bit number 0 and not a mask like "(1 << 0)". It is used like this: if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) The test_and_set_bit() has a shift built in so it's a double left shift and uses bit number 1 instead of number 0. This bug is harmless because it's done consistently and it doesn't clash with any other flags. Fixes: f1342709d18a ("scsi: Do not rely on blk-mq for double completions") Reviewed-by: Keith Busch <keith.busch@intel.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvme: remove unused function nvme_ctrl_readyIsrael Rukshin
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvme: implement Enhanced Command RetryKeith Busch
A controller may have an internal state that is not able to successfully process commands for a short duration. In such states, an immediate command requeue is expected to fail. The driver may exceed its max retry count, which permanently ends the command in failure when the same command would succeed after waiting for the controller to be ready. NVMe ratified TP 4033 provides a delay hint in the completion status code for failed commands. Implement the retry delay based on the command completion status and the controller's requested delay. Note that requeued commands are handled per request_queue, not per individual request. If multiple commands fail, the controller should consistently report the desired delay time for retryable commands in all CQEs, otherwise the requeue list may be kicked too soon. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: fix the structure member indentationChaitanya Kulkarni
This is a cleanup patch which fixes the structure member indentation introduced by the p2p: commit c6925093d0b2 ("nvmet: Optionally use PCI P2P memory"). We don't change any functionality in this patch. This is needed so that any future members will also follow the uniform indentation. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: use unlikely for req status checkChaitanya Kulkarni
This patch adds unlikely in the nvmet request completion path for the status check in the low level function __nvmet_req_complete. This is helpful in the scenario where host and target connection is working smoothly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet-rdma: Add unlikely for response allocated checkIsrael Rukshin
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvme: Remove unused forward declarationIsrael Rukshin
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvme: disable fabrics SQ flow control when asked by the userSagi Grimberg
As for now, we don't care about sq_head pointer updates anyway, so at least allow the controller to micro-optimize by omiting this update. Note that we will probably need to support it when a controller that requires this comes along. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: expose support for fabrics SQ flow control disable in treqSagi Grimberg
Technical Proposal introduces an indication for SQ flow control disable support. Expose it since we are able to operate in this mode. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: don't override treq upon modification.Sagi Grimberg
Only override the allowed parts of it. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: support fabrics sq flow controlSagi Grimberg
Technical proposal 8005 "fabrics SQ flow control" introduces a mode where a host and controller agree to omit sq_head pointer updates when sending nvme completions. In case the host indicated desire to operate in this mode (connect attribute) the controller will return back a connect completion with sq_head value of 0xffff as indication that it will omit sq_head pointer updates. This mode saves us an atomic update in the I/O path. Reviewed-by: Hannes Reinecke <hare@suse.com> [hch: suggested better implementation] Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet-fc: remove the IN_ISR deferred scheduling optionsJames Smart
All target lldd's call the cmd receive and op completions in non-isr thread contexts. As such the IN_ISR options are not necessary. Remove the functionality and flags, which also removes cpu assignments to queues. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: mark nvmet_genctr staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: enable Discovery Controller AENsJay Sternberg
Add functions to find connections requesting Discovery Change events and send a notification to hosts that maintain an explicit persistent connection and have and active Asynchronous Event Request pending. Only Hosts that have access to the Subsystem effected by the change will receive notifications of Discovery Change event. Call these functions each time there is a configfs change that effects the Discover Log Pages. Set the OAES field in the Identify Controller response to advertise the support for Asynchronous Event Notifications. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Phil Cayton <phil.cayton@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: allow host connect even if no allowed subsystems are exportedSagi Grimberg
It is perfectly valid that a host connects to a discovery subsystem and gets an empty discovery log page since no subsystems are provisioned to it. No reason to disallow connecting to the discovery subsystem all together. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Phil Cayton <phil.cayton@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: add support to Discovery controllers for commandsJay Sternberg
Add custom get/set features to commands allowed by Discovery controllers. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: add defines for discovery change async eventsJay Sternberg
Add AEN/AER values as defined by the specification Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: make kato and AEN processing for use by other controllersJay Sternberg
Make common process of get/set features available to other controllers by making simple functions static inline and others not static and prototypes in nvmet.h file Also remove static from nvmet_execute_async_event and add prototype to nvmet.h to allow used by other controllers Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07nvmet: allow Keep Alive for Discovery controllerJay Sternberg
Per change to specification allowing Discovery controllers to have explicit persistent connections, remove restriction on Discovery controllers allowing kato on connect. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>