aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-21net, sk_msg: Annotate lockless access to sk_prot on cloneJakub Sitnicki
sk_msg and ULP frameworks override protocol callbacks pointer in sk->sk_prot, while tcp accesses it locklessly when cloning the listening socket, that is with neither sk_lock nor sk_callback_lock held. Once we enable use of listening sockets with sockmap (and hence sk_msg), there will be shared access to sk->sk_prot if socket is getting cloned while being inserted/deleted to/from the sockmap from another CPU: Read side: tcp_v4_rcv sk = __inet_lookup_skb(...) tcp_check_req(sk) inet_csk(sk)->icsk_af_ops->syn_recv_sock tcp_v4_syn_recv_sock tcp_create_openreq_child inet_csk_clone_lock sk_clone_lock READ_ONCE(sk->sk_prot) Write side: sock_map_ops->map_update_elem sock_map_update_elem sock_map_update_common sock_map_link_no_progs tcp_bpf_init tcp_bpf_update_sk_prot sk_psock_update_proto WRITE_ONCE(sk->sk_prot, ops) sock_map_ops->map_delete_elem sock_map_delete_elem __sock_map_delete sock_map_unref sk_psock_put sk_psock_drop sk_psock_restore_proto tcp_update_ulp WRITE_ONCE(sk->sk_prot, proto) Mark the shared access with READ_ONCE/WRITE_ONCE annotations. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
2020-02-20docs/bpf: Update bpf development Q/A fileYonghong Song
bpf now has its own mailing list bpf@vger.kernel.org. Update the bpf_devel_QA.rst file to reflect this. Also llvm has switch to github with llvm and clang in the same repo https://github.com/llvm/llvm-project.git. Update the QA file with newer build instructions. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200221004354.930952-1-yhs@fb.com
2020-02-20selftests/bpf: Fix trampoline_count clean up logicAndrii Nakryiko
Libbpf's Travis CI tests caught this issue. Ensure bpf_link and bpf_object clean up is performed correctly. Fixes: d633d57902a5 ("selftest/bpf: Add test for allowed trampolines count") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20200220230546.769250-1-andriin@fb.com
2020-02-20Merge branch 'set_attach_target'Alexei Starovoitov
Eelco Chaudron says: ==================== Currently when you want to attach a trace program to a bpf program the section name needs to match the tracepoint/function semantics. However the addition of the bpf_program__set_attach_target() API allows you to specify the tracepoint/function dynamically. The call flow would look something like this: xdp_fd = bpf_prog_get_fd_by_id(id); trace_obj = bpf_object__open_file("func.o", NULL); prog = bpf_object__find_program_by_title(trace_obj, "fentry/myfunc"); bpf_program__set_expected_attach_type(prog, BPF_TRACE_FENTRY); bpf_program__set_attach_target(prog, xdp_fd, "xdpfilt_blk_all"); bpf_object__load(trace_obj) v1 -> v2: Remove requirement for attach type hint in API v2 -> v3: Moved common warning to __find_vmlinux_btf_id, requested by Andrii Updated the xdp_bpf2bpf test to use this new API v3 -> v4: Split up patch, update libbpf.map version v4 -> v5: Fix return code, and prog assignment in test case ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-20selftests/bpf: Update xdp_bpf2bpf test to use new set_attach_target APIEelco Chaudron
Use the new bpf_program__set_attach_target() API in the xdp_bpf2bpf selftest so it can be referenced as an example on how to use it. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/158220520562.127661.14289388017034825841.stgit@xdp-tutorial
2020-02-20libbpf: Add support for dynamic program attach targetEelco Chaudron
Currently when you want to attach a trace program to a bpf program the section name needs to match the tracepoint/function semantics. However the addition of the bpf_program__set_attach_target() API allows you to specify the tracepoint/function dynamically. The call flow would look something like this: xdp_fd = bpf_prog_get_fd_by_id(id); trace_obj = bpf_object__open_file("func.o", NULL); prog = bpf_object__find_program_by_title(trace_obj, "fentry/myfunc"); bpf_program__set_expected_attach_type(prog, BPF_TRACE_FENTRY); bpf_program__set_attach_target(prog, xdp_fd, "xdpfilt_blk_all"); bpf_object__load(trace_obj) Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158220519486.127661.7964708960649051384.stgit@xdp-tutorial
2020-02-20libbpf: Bump libpf current version to v0.0.8Eelco Chaudron
New development cycles starts, bump to v0.0.8. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158220518424.127661.8278643006567775528.stgit@xdp-tutorial
2020-02-20libbpf: Relax check whether BTF is mandatoryAndrii Nakryiko
If BPF program is using BTF-defined maps, BTF is required only for libbpf itself to process map definitions. If after that BTF fails to be loaded into kernel (e.g., if it doesn't support BTF at all), this shouldn't prevent valid BPF program from loading. Existing retry-without-BTF logic for creating maps will succeed to create such maps without any problems. So, presence of .maps section shouldn't make BTF required for kernel. Update the check accordingly. Validated by ensuring simple BPF program with BTF-defined maps is still loaded on old kernel without BTF support and map is correctly parsed and created. Fixes: abd29c931459 ("libbpf: allow specifying map definitions using BTF") Reported-by: Julia Kartseva <hex@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200220062635.1497872-1-andriin@fb.com
2020-02-20selftests/bpf: Fix build of sockmap_ktls.cAlexei Starovoitov
The selftests fails to build with: tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c: In function ‘test_sockmap_ktls_disconnect_after_delete’: tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c:72:37: error: ‘TCP_ULP’ undeclared (first use in this function) 72 | err = setsockopt(cli, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); | ^~~~~~~ Similar to commit that fixes build of sockmap_basic.c on systems with old /usr/include fix the build of sockmap_ktls.c Fixes: d1ba1204f2ee ("selftests/bpf: Test unhashing kTLS socket after removing from map") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200219205514.3353788-1-ast@kernel.org
2020-02-19selftests/bpf: Change llvm flag -mcpu=probe to -mcpu=v3Yonghong Song
The latest llvm supports cpu version v3, which is cpu version v1 plus some additional 64bit jmp insns and 32bit jmp insn support. In selftests/bpf Makefile, the llvm flag -mcpu=probe did runtime probe into the host system. Depending on compilation environments, it is possible that runtime probe may fail, e.g., due to memlock issue. This will cause generated code with cpu version v1. This may cause confusion as the same compiler and the same C code generates different byte codes in different environment. Let us change the llvm flag -mcpu=probe to -mcpu=v3 so the generated code will be the same regardless of the compilation environment. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200219004236.2291125-1-yhs@fb.com
2020-02-19Merge branch 'bpf_read_branch_records'Alexei Starovoitov
Daniel Xu says: ==================== Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Changes in v8: - Use globals instead of perf buffer - Call test_perf_branches__detach() before destroying skeleton - Fix typo in docs Changes in v7: - Const-ify and static-ify local var - Documentation formatting Changes in v6: - Move #ifdef a little to avoid unused variable warnings on !x86 - Test negative condition in selftest (-EINVAL on improperly configured perf event) - Skip positive condition selftest on setups that don't support branch records Changes in v5: - Rename bpf_perf_prog_read_branches() -> bpf_read_branch_records() - Rename BPF_F_GET_BR_SIZE -> BPF_F_GET_BRANCH_RECORDS_SIZE - Squash tools/ bpf.h sync into selftest commit Changes in v4: - Add BPF_F_GET_BR_SIZE flag - Return -ENOENT on unsupported architectures - Only accept initialized memory in helper - Check buffer size is multiple of sizeof(struct perf_branch_entry) - Use bpf skeleton in selftest - Add commit messages - Spelling and formatting Changes in v3: - Document filling unused buffer with zero - Formatting fixes - Rebase Changes in v2: - Change to a bpf helper instead of context access - Avoid mentioning Intel specific things ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-19selftests/bpf: Add bpf_read_branch_records() selftestDaniel Xu
Add a selftest to test: * default bpf_read_branch_records() behavior * BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior * error path on non branch record perf events * using helper to write to stack * using helper to write to global On host with hardware counter support: # ./test_progs -t perf_branches #27/1 perf_branches_hw:OK #27/2 perf_branches_no_hw:OK #27 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED On host without hardware counter support (VM): # ./test_progs -t perf_branches #27/1 perf_branches_hw:OK #27/2 perf_branches_no_hw:OK #27 perf_branches:OK Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED Also sync tools/include/uapi/linux/bpf.h. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-3-dxu@dxuuu.xyz
2020-02-19bpf: Add bpf_read_branch_records() helperDaniel Xu
Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
2020-02-19Merge branch 'bpf-skmsg-simplify-restore'Daniel Borkmann
Jakub Sitnicki says: ==================== This series has been split out from "Extend SOCKMAP to store listening sockets" [0]. I think it stands on its own, and makes the latter series smaller, which will make the review easier, hopefully. The essence is that we don't need to do a complicated dance in sk_psock_restore_proto, if we agree that the contract with tcp_update_ulp is to restore callbacks even when the socket doesn't use ULP. This is what tcp_update_ulp currently does, and we just make use of it. Series is accompanied by a test for a particularly tricky case of restoring callbacks when we have both sockmap and tls callbacks configured in sk->sk_prot. [0] https://lore.kernel.org/bpf/20200127131057.150941-1-jakub@cloudflare.com/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-02-19selftests/bpf: Test unhashing kTLS socket after removing from mapJakub Sitnicki
When a TCP socket gets inserted into a sockmap, its sk_prot callbacks get replaced with tcp_bpf callbacks built from regular tcp callbacks. If TLS gets enabled on the same socket, sk_prot callbacks get replaced once again, this time with kTLS callbacks built from tcp_bpf callbacks. Now, we allow removing a socket from a sockmap that has kTLS enabled. After removal, socket remains with kTLS configured. This is where things things get tricky. Since the socket has a set of sk_prot callbacks that are a mix of kTLS and tcp_bpf callbacks, we need to restore just the tcp_bpf callbacks to the original ones. At the moment, it comes down to the the unhash operation. We had a regression recently because tcp_bpf callbacks were not cleared in this particular scenario of removing a kTLS socket from a sockmap. It got fixed in commit 4da6a196f93b ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop"). Add a test that triggers the regression so that we don't reintroduce it in the future. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200217121530.754315-4-jakub@cloudflare.com
2020-02-19bpf, sk_msg: Don't clear saved sock proto on restoreJakub Sitnicki
There is no need to clear psock->sk_proto when restoring socket protocol callbacks in sk->sk_prot. The psock is about to get detached from the sock and eventually destroyed. At worst we will restore the protocol callbacks and the write callback twice. This makes reasoning about psock state easier. Once psock is initialized, we can count on psock->sk_proto always being set. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200217121530.754315-3-jakub@cloudflare.com
2020-02-19bpf, sk_msg: Let ULP restore sk_proto and write_space callbackJakub Sitnicki
We don't need a fallback for when the socket is not using ULP. tcp_update_ulp handles this case exactly the same as we do in sk_psock_restore_proto. Get rid of the duplicated code. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200217121530.754315-2-jakub@cloudflare.com
2020-02-18bpf: Allow bpf_perf_event_read_value in all BPF programsSong Liu
bpf_perf_event_read_value() is NMI safe. Enable it for all BPF programs. This can be used in fentry/fexit to profile BPF program and individual kernel function with hardware counters. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200214234146.2910011-1-songliubraving@fb.com
2020-02-17net: ena: remove set but not used variable 'hash_key'YueHaibing
drivers/net/ethernet/amazon/ena/ena_com.c: In function ena_com_hash_key_allocate: drivers/net/ethernet/amazon/ena/ena_com.c:1070:50: warning: variable hash_key set but not used [-Wunused-but-set-variable] commit 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported") introduced this, but not used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: netlink: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: switchdev: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17bpf, sockmap: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17NFC: digital: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: usb: cdc-phonet: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: phy: allow bcm84881 to be a moduleRussell King
Now that the phylib module loading issue has been resolved, we can allow this PHY driver to be built as a module. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17Merge branch 'net-smc-next'David S. Miller
Ursula Braun says: ==================== net/smc: patches 2020-02-17 here are patches for SMC making termination tasks more perfect. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: reduce port_event schedulingUrsula Braun
IB event handlers schedule the port event worker for further processing of port state changes. This patch reduces the number of schedules to avoid duplicate processing of the same port change. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: simplify normal link terminationKarsten Graul
smc_lgr_terminate() and smc_lgr_terminate_sched() both result in soft link termination, smc_lgr_terminate_sched() is scheduling a worker for this task. Take out complexity by always using the termination worker and getting rid of smc_lgr_terminate() completely. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: remove unused parameter of smc_lgr_terminate()Karsten Graul
The soft parameter of smc_lgr_terminate() is not used and obsolete. Remove it. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: do not delete lgr from list twiceKarsten Graul
When 2 callers call smc_lgr_terminate() at the same time for the same lgr, one gets the lgr_lock and deletes the lgr from the list and releases the lock. Then the second caller gets the lock and tries to delete it again. In smc_lgr_terminate() add a check if the link group lgr is already deleted from the link group list and prevent to try to delete it a second time. And add a check if the lgr is marked as freeing, which means that a termination is already pending. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: use termination worker under send_lockKarsten Graul
smc_tx_rdma_write() is called under the send_lock and should not call smc_lgr_terminate() directly. Call smc_lgr_terminate_sched() instead which schedules a worker. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: improve smc_lgr_cleanup()Karsten Graul
smc_lgr_cleanup() is called during termination processing, there is no need to send a DELETE_LINK at that time. A DELETE_LINK should have been sent before the termination is initiated, if needed. And remove the extra call to wake_up(&lnk->wr_reg_wait) because smc_llc_link_inactive() already calls the related helper function smc_wr_wakeup_reg_wait(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17Merge branch 'mlxsw-Reduce-dependency-between-bridge-and-router-code'David S. Miller
Ido Schimmel says: ==================== mlxsw: Reduce dependency between bridge and router code This patch set reduces the dependency between the bridge and the router code in preparation for RTNL removal from the route insertion path in mlxsw. The motivation and solution are explained in detail in patch #3. The main idea is that we need to stop special-casing the VXLAN devices with regards to the reference counting of the FIDs. Otherwise, we can bump into the situation described in patch #3, where the routing code calls into the bridge code which calls back into the routing code. After adding a mutex to protect router data structures to remove RTNL dependency, this can result in an AA deadlock. Patches #1 and #2 are preparations. They convert the FIDs to use 'refcount_t' for reference counting in order to catch over/under flows and add extack to the bridge creation function. Patches #3-#5 reduce the dependency between the bridge and the router code. First, by having the VXLAN device take a reference on the FID in patch #3 and then by removing unnecessary code following the change in patch #3. Patches #6-#10 adjust existing selftests and add new ones to exercise the new code paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17selftests: mlxsw: vxlan: Add test for error pathIdo Schimmel
Test that when two VXLAN tunnels with conflicting configurations (i.e., different TTL) are enslaved to the same VLAN-aware bridge, then the enslavement of a port to the bridge is denied. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17selftests: mlxsw: vxlan: Adjust test to recent changesIdo Schimmel
After recent changes, the VXLAN tunnel will be offloaded regardless if any local ports are member in the FID or not. Adjust the test to make sure the tunnel is offloaded in this case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17selftests: mlxsw: extack: Test creation of multiple VLAN-aware bridgesIdo Schimmel
The driver supports a single VLAN-aware bridge. Test that the enslavement of a port to the second VLAN-aware bridge fails with an extack. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17selftests: mlxsw: extack: Test bridge creation with VXLANIdo Schimmel
Test that creation of a bridge (both VLAN-aware and VLAN-unaware) fails with an extack when a VXLAN device with an unsupported configuration is already enslaved to it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17selftests: mlxsw: Remove deprecated testIdo Schimmel
The addition of a VLAN on a bridge slave prompts the driver to have the local port in question join the FID corresponding to this VLAN. Before recent changes, the operation of joining the FID would also mean that the driver would enable VXLAN tunneling if a VXLAN device was also member in the VLAN. In case the configuration of the VXLAN tunnel was not supported, an extack error would be returned. Since the operation of joining the FID no longer means that VXLAN tunneling is potentially enabled, the test is no longer relevant. Remove it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17mlxsw: spectrum: Reduce dependency between bridge and router codeIdo Schimmel
Commit f40be47a3e40 ("mlxsw: spectrum_router: Do not force specific configuration order") added a call from the routing code to the bridge code in order to handle the case where VNI should be set on a FID following the joining of the router port to the FID. This is no longer required, as previous patches made VXLAN devices explicitly take a reference on the FID and set VNI on it. Therefore, remove the unnecessary call and simply have the RIF take a reference on the FID without checking if VNI should also be set on it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17mlxsw: spectrum_switchdev: Remove VXLAN checks during FID membershipIdo Schimmel
As explained in previous patch, VXLAN devices now take a reference on the FID and not only local ports. Therefore, there is no need for local ports to check if they need to set a VNI on the FID when they join the FID. Remove these unnecessary checks. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17mlxsw: spectrum_switchdev: Have VXLAN device take reference on FIDIdo Schimmel
Up until now only local ports and the router port (which is also a local port) took a reference on the corresponding FID (Filtering Identifier) when joining a bridge. For example: 192.0.2.1/24 br0 | +------+------+ | | swp1 vxlan0 In this case the reference count of the FID will be '2'. Since the VXLAN device does not take a reference on the FID, whenever a local port joins the bridge it needs to check if a VXLAN device is already enslaved. If the VXLAN device should be mapped to the FID in question, then the VXLAN device's VNI is set on the FID. Beside the fact that this scheme special-cases the VXLAN device, it also creates an unnecessary dependency between the routing and bridge code: 1. [R] IP address is added on 'br0', which prompts the creation of a RIF and a backing FID 2. [B] VNI is enabled on backing FID 3. [R] Host route corresponding to VXLAN device's source address is promoted to perform NVE decapsulation [R] - Routing code [B] - Bridge code This back and forth dependency will become problematic when a lock is added in the routing code instead of relying on RTNL, as it will result in an AA deadlock. Instead, have the VXLAN device take a reference on the FID just like all the other netdev members of the bridge. In order to correctly handle the case where VXLAN devices are already enslaved to the bridge when it is offloaded, walk the bridge's slaves and replay the configuration. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17mlxsw: spectrum_switchdev: Propagate extack to bridge creation functionIdo Schimmel
Propagate extack to bridge creation function so that error messages could be passed to user space via netlink instead of printing them to kernel log. A subsequent patch will pass the new extack argument to more functions. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17mlxsw: spectrum_fid: Use 'refcount_t' for FID reference countingIdo Schimmel
'refcount_t' is very useful for catching over/under flows. Convert the FID (Filtering Identifier) objects to use it instead of 'unsigned int' for their reference count. A subsequent patch in the series will change the way VXLAN devices hold / release the FID reference, which is why the conversion is made now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: bridge: teach ndo_dflt_bridge_getlink() more brport flagsJulian Wiedmann
This enables ndo_dflt_bridge_getlink() to report a bridge port's offload settings for multicast and broadcast flooding. CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17Merge branch 'sfc-couple-more-ARFS-tidy-ups'David S. Miller
Edward Cree says: ==================== couple more ARFS tidy-ups Tie up some loose ends from the recent ARFS work. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17sfc: move some ARFS code out of headersEdward Cree
efx_filter_rfs_expire() is a work-function, so it being inline makes no sense. It's only ever used in efx_channels.c, so move it there. While we're at it, clean out some related unused cruft. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17sfc: only schedule asynchronous filter work if neededEdward Cree
Prevent excessive CPU time spent running a workitem with nothing to do. We avoid any races by keeping the same check in efx_filter_rfs_expire(). Suggested-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: vlan: suppress "failed to kill vid" warningsJulian Wiedmann
When a real dev unregisters, vlan_device_event() also unregisters all of its vlan interfaces. For each VID this ends up in __vlan_vid_del(), which attempts to remove the VID from the real dev's VLAN filter. But the unregistering real dev might no longer be able to issue the required IOs, and return an error. Subsequently we raise a noisy warning msg that is not appropriate for this situation: the real dev is being torn down anyway, there shouldn't be any worry about cleanly releasing all of its HW-internal resources. So to avoid scaring innocent users, suppress this warning when the failed deletion happens on an unregistering device. While at it also convert the raw pr_warn() to a more fitting netdev_warn(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: stmmac: Get rid of custom STMMAC_DEVICE() macroAndy Shevchenko
Since PCI core provides a generic PCI_DEVICE_DATA() macro, replace STMMAC_DEVICE() with former one. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17Merge branch 'Remove-rtnl-lock-dependency-from-flow_action-infra'David S. Miller
Vlad Buslov says: ==================== Remove rtnl lock dependency from flow_action infra Currently, TC flow_action infrastructure code obtain rtnl lock before accessing action state in tc_setup_flow_action() function and releases it afterwards. This behavior is not supposed to impact TC filter insertion rate because filling flow_action representation is only a small part of creating new filter and expensive operations (hardware offload callbacks, classifiers, cls API code that creates chains and classifiers instances) already support unlocked execution. However, typical vswitch implementation might need to also dump TC filters concurrently, for example to age out unused flows or update flow counters. TC dump is fully serialized and holds rtnl lock during its whole execution in kernel space. As such, it can significantly impact concurrent tasks that try to intermittently obtain rtnl lock when filling intermediate representation for new filter offload (performance evaluation at the end of this mail). Refactor flow_action cls API infrastructure and its dependencies to not rely on rtnl lock for synchronization. Patch set overview: - Refactor tc_setup_flow_action() to obtain action tcf_lock when accessing action state. Fix its dependencies to not obtain tcf_lock themselves and assume that caller already holds it (needs to be done in same patch to prevent deadlock) and not to call sleeping functions (needs to be done in same patch to prevent "sleeping while atomic" dmesg warnings). - Refactor action helper functions to require tcf_lock instead of rtnl. Internally, all of the actions already use tcf_lock for synchronization to accommodate unlocked classifier API, so this change relies on already existing functionality. - Remove rtnl lock and "rtnl_held" argument from tc_setup_flow_action() function. To test the change, multiple concurrent TC instances are invoked with following command: time ls add* | xargs -n 1 -P 100 sudo tc -b Ten batch files with following typical rules (100k each) are used: filter add dev ens1f0_0 protocol ip ingress prio 1 handle 1 flower src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 src_ip 192.168.111.1 dst_ip 192.168.111.2 ip_proto udp dst_port 1 src_port 1 action tunnel_key set id 1 src_ip 2.2.2.2 dst_ip 2.2.2.3 dst_port 4789 no_percpu action mirred egress redirect dev vxlan1 no_percpu TC dump of same device is called in infinite loop from five concurrent instances: while true do tc -s filter show dev $NIC ingress >/dev/null done Results obtained on current net-next commit 9f68e3655aae ("Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm"): | net-next | this change ---------------+----------+------------- TC add | 6.3s | 6.3s TC add + dump | 29.3s | 6.8s Test results confirm significant impact of concurrent TC dump. The impact is almost fully mitigated by proposed change (differences can be attributed to contention for chain and tp locks between add and dump TC instances). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>