diff options
author | Jakub Kicinski | 2023-03-06 20:36:39 -0800 |
---|---|---|
committer | Jakub Kicinski | 2023-03-06 20:36:39 -0800 |
commit | 36e5e391a25af28dc1f4586f95d577b38ff4ed72 (patch) | |
tree | 3b58175e4a148f54338d22c926d7dd2a6283d317 /Documentation | |
parent | 5ca26d6039a6b42341f7f5cc8d10d30ca1561a7b (diff) | |
parent | 8f4c92f0024ff2a30f002e85f87e531d49dc023c (diff) |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities,
from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc,
from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them,
from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment
to functions inside ELF objects contained in APKs via function names,
from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
to start the timer with absolute expiration value instead of relative
one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes
in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF
instructions in terms of how bytes are stored under big/little endian,
from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels,
from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark
a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
selftests/bpf: Split test_attach_probe into multi subtests
libbpf: Add support to set kprobe/uprobe attach mode
tools/resolve_btfids: Add /libsubcmd to .gitignore
bpf: add support for fixed-size memory pointer returns for kfuncs
bpf: generalize dynptr_get_spi to be usable for iters
bpf: mark PTR_TO_MEM as non-null register type
bpf: move kfunc_call_arg_meta higher in the file
bpf: ensure that r0 is marked scratched after any function call
bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
bpf: clean up visit_insn()'s instruction processing
selftests/bpf: adjust log_fixup's buffer size for proper truncation
bpf: honor env->test_state_freq flag in is_state_visited()
selftests/bpf: enhance align selftest's expected log matching
bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
bpf: improve stack slot state printing
selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
bpf: allow ctx writes using BPF_ST_MEM instruction
bpf: Use separate RCU callbacks for freeing selem
...
====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/bpf/bpf_design_QA.rst | 4 | ||||
-rw-r--r-- | Documentation/bpf/bpf_devel_QA.rst | 14 | ||||
-rw-r--r-- | Documentation/bpf/cpumasks.rst | 4 | ||||
-rw-r--r-- | Documentation/bpf/instruction-set.rst | 40 | ||||
-rw-r--r-- | Documentation/bpf/kfuncs.rst | 41 | ||||
-rw-r--r-- | Documentation/bpf/maps.rst | 7 |
6 files changed, 74 insertions, 36 deletions
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index bfff0e7e37c2..38372a956d65 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -314,7 +314,7 @@ Q: What is the compatibility story for special BPF types in map values? Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map values (when using BTF support for BPF maps). This allows to use helpers for such objects on these fields inside map values. Users are also allowed to embed -pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the +pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the kernel preserve backwards compatibility for these features? A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else: @@ -324,7 +324,7 @@ For struct types that have been added already, like bpf_spin_lock and bpf_timer, the kernel will preserve backwards compatibility, as they are part of UAPI. For kptrs, they are also part of UAPI, but only with respect to the kptr -mechanism. The types that you can use with a __kptr and __kptr_ref tagged +mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged pointer in your struct are NOT part of the UAPI contract. The supported types can and will change across kernel releases. However, operations like accessing kptr fields and bpf_kptr_xchg() helper will continue to be supported across kernel diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index 03d4993eda6f..5f5f9ccc3862 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -128,7 +128,7 @@ into the bpf-next tree will make their way into net-next tree. net and net-next are both run by David S. Miller. From there, they will go into the kernel mainline tree run by Linus Torvalds. To read up on the process of net and net-next being merged into the mainline tree, see -the :ref:`netdev-FAQ` +the `netdev-FAQ`_. @@ -147,7 +147,7 @@ request):: Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to? --------------------------------------------------------------------------------- -A: The process is the very same as described in the :ref:`netdev-FAQ`, +A: The process is the very same as described in the `netdev-FAQ`_, so please read up on it. The subject line must indicate whether the patch is a fix or rather "next-like" content in order to let the maintainers know whether it is targeted at bpf or bpf-next. @@ -206,7 +206,7 @@ ii) run extensive BPF test suite and Once the BPF pull request was accepted by David S. Miller, then the patches end up in net or net-next tree, respectively, and make their way from there further into mainline. Again, see the -:ref:`netdev-FAQ` for additional information e.g. on how often they are +`netdev-FAQ`_ for additional information e.g. on how often they are merged to mainline. Q: How long do I need to wait for feedback on my BPF patches? @@ -230,7 +230,7 @@ Q: Are patches applied to bpf-next when the merge window is open? ----------------------------------------------------------------- A: For the time when the merge window is open, bpf-next will not be processed. This is roughly analogous to net-next patch processing, -so feel free to read up on the :ref:`netdev-FAQ` about further details. +so feel free to read up on the `netdev-FAQ`_ about further details. During those two weeks of merge window, we might ask you to resend your patch series once bpf-next is open again. Once Linus released @@ -394,7 +394,7 @@ netdev kernel mailing list in Cc and ask for the fix to be queued up: netdev@vger.kernel.org The process in general is the same as on netdev itself, see also the -:ref:`netdev-FAQ`. +`netdev-FAQ`_. Q: Do you also backport to kernels not currently maintained as stable? ---------------------------------------------------------------------- @@ -410,7 +410,7 @@ Q: The BPF patch I am about to submit needs to go to stable as well What should I do? A: The same rules apply as with netdev patch submissions in general, see -the :ref:`netdev-FAQ`. +the `netdev-FAQ`_. Never add "``Cc: stable@vger.kernel.org``" to the patch description, but ask the BPF maintainers to queue the patches instead. This can be done @@ -685,7 +685,7 @@ when: .. Links .. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/ -.. _netdev-FAQ: Documentation/process/maintainer-netdev.rst +.. _netdev-FAQ: https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html .. _selftests: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/ .. _Documentation/dev-tools/kselftest.rst: diff --git a/Documentation/bpf/cpumasks.rst b/Documentation/bpf/cpumasks.rst index 24bef9cbbeee..75344cd230e5 100644 --- a/Documentation/bpf/cpumasks.rst +++ b/Documentation/bpf/cpumasks.rst @@ -51,7 +51,7 @@ For example: .. code-block:: c struct cpumask_map_value { - struct bpf_cpumask __kptr_ref * cpumask; + struct bpf_cpumask __kptr * cpumask; }; struct array_map { @@ -128,7 +128,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map: /* struct containing the struct bpf_cpumask kptr which is stored in the map. */ struct cpumasks_kfunc_map_value { - struct bpf_cpumask __kptr_ref * bpf_cpumask; + struct bpf_cpumask __kptr * bpf_cpumask; }; /* The map containing struct cpumasks_kfunc_map_value entries. */ diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst index af515de5fc38..db8789e6969e 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/instruction-set.rst @@ -38,14 +38,11 @@ eBPF has two instruction encodings: * the wide instruction encoding, which appends a second 64-bit immediate (i.e., constant) value after the basic instruction for a total of 128 bits. -The basic instruction encoding is as follows, where MSB and LSB mean the most significant -bits and least significant bits, respectively: +The fields conforming an encoded basic instruction are stored in the +following order:: -============= ======= ======= ======= ============ -32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) -============= ======= ======= ======= ============ -imm offset src_reg dst_reg opcode -============= ======= ======= ======= ============ + opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. + opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. **imm** signed integer immediate value @@ -63,6 +60,18 @@ imm offset src_reg dst_reg opcode **opcode** operation to perform +Note that the contents of multi-byte fields ('imm' and 'offset') are +stored using big-endian byte ordering in big-endian BPF and +little-endian byte ordering in little-endian BPF. + +For example:: + + opcode offset imm assembly + src_reg dst_reg + 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little + dst_reg src_reg + 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big + Note that most instructions do not use all of the fields. Unused fields shall be cleared to zero. @@ -72,18 +81,23 @@ The 64 bits following the basic instruction contain a pseudo instruction using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, and imm containing the high 32 bits of the immediate value. -================= ================== -64 bits (MSB) 64 bits (LSB) -================= ================== -basic instruction pseudo instruction -================= ================== +This is depicted in the following figure:: + + basic_instruction + .-----------------------------. + | | + code:8 regs:8 offset:16 imm:32 unused:32 imm:32 + | | + '--------------' + pseudo instruction Thus the 64-bit immediate value is constructed as follows: imm64 = (next_imm << 32) | imm where 'next_imm' refers to the imm value of the pseudo instruction -following the basic instruction. +following the basic instruction. The unused bytes in the pseudo +instruction are reserved and shall be cleared to zero. Instruction classes ------------------- diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index ca96ef3f6896..69eccf6f98ef 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -100,6 +100,23 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a size parameter, and the value of the constant matters for program safety, __k suffix should be used. +2.2.2 __uninit Annotation +------------------------- + +This annotation is used to indicate that the argument will be treated as +uninitialized. + +An example is given below:: + + __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit) + { + ... + } + +Here, the dynptr will be treated as an uninitialized dynptr. Without this +annotation, the verifier will reject the program if the dynptr passed in is +not initialized. + .. _BPF_kfunc_nodef: 2.3 Using an existing kernel function @@ -232,11 +249,13 @@ added later. 2.4.8 KF_RCU flag ----------------- -The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument. -When used together with KF_ACQUIRE, it indicates the kfunc should have a -single argument which must be a trusted argument or a MEM_RCU pointer. -The argument may have reference count of 0 and the kfunc must take this -into consideration. +The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with +KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees +that the objects are valid and there is no use-after-free. The pointers are not +NULL, but the object's refcount could have reached zero. The kfuncs need to +consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE +pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely +also be KF_RET_NULL. .. _KF_deprecated_flag: @@ -527,7 +546,7 @@ Here's an example of how it can be used: /* struct containing the struct task_struct kptr which is actually stored in the map. */ struct __cgroups_kfunc_map_value { - struct cgroup __kptr_ref * cgroup; + struct cgroup __kptr * cgroup; }; /* The map containing struct __cgroups_kfunc_map_value entries. */ @@ -583,13 +602,17 @@ Here's an example of how it can be used: ---- -Another kfunc available for interacting with ``struct cgroup *`` objects is -bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup, -and return it as a cgroup kptr. +Other kfuncs available for interacting with ``struct cgroup *`` objects are +bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access +the ancestor of a cgroup and find a cgroup by its ID, respectively. Both +return a cgroup kptr. .. kernel-doc:: kernel/bpf/helpers.c :identifiers: bpf_cgroup_ancestor +.. kernel-doc:: kernel/bpf/helpers.c + :identifiers: bpf_cgroup_from_id + Eventually, BPF should be updated to allow this to happen with a normal memory load in the program itself. This is currently not possible without more work in the verifier. bpf_cgroup_ancestor() can be used as follows: diff --git a/Documentation/bpf/maps.rst b/Documentation/bpf/maps.rst index 4906ff0f8382..6f069f3d6f4b 100644 --- a/Documentation/bpf/maps.rst +++ b/Documentation/bpf/maps.rst @@ -11,9 +11,9 @@ maps are accessed from BPF programs via BPF helpers which are documented in the `man-pages`_ for `bpf-helpers(7)`_. BPF maps are accessed from user space via the ``bpf`` syscall, which provides -commands to create maps, lookup elements, update elements and delete -elements. More details of the BPF syscall are available in -:doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_. +commands to create maps, lookup elements, update elements and delete elements. +More details of the BPF syscall are available in `ebpf-syscall`_ and in the +`man-pages`_ for `bpf(2)`_. Map Types ========= @@ -79,3 +79,4 @@ Find and delete element by key in a given map using ``attr->map_fd``, .. _man-pages: https://www.kernel.org/doc/man-pages/ .. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html .. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html +.. _ebpf-syscall: https://docs.kernel.org/userspace-api/ebpf/syscall.html |