diff options
author | Dave Airlie | 2021-08-12 09:43:38 +1000 |
---|---|---|
committer | Dave Airlie | 2021-08-12 09:56:04 +1000 |
commit | 25fed6b324ac556859d6dd0b7827cc8fb653ca99 (patch) | |
tree | 3f634521c723d6434c921fc1adbefa0e7d7e4be7 /Documentation/gpu | |
parent | 59b9d6baa1bea254d31042c42bcb8f946c263bae (diff) | |
parent | 927dfdd09d8c03ba100ed0c8c3915f8e1d1f5556 (diff) |
Merge tag 'drm-intel-gt-next-2021-08-06-1' of ssh://git.freedesktop.org/git/drm/drm-intel into drm-next
UAPI Changes:
- Add I915_MMAP_OFFSET_FIXED
On devices with local memory `I915_MMAP_OFFSET_FIXED` is the only valid
type. On devices without local memory, this caching mode is invalid.
As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will
be used, depending on the object placement on creation. WB will be used
when the object can only exist in system memory, WC otherwise.
Userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11888
- Reinstate the mmap ioctl for (already released) integrated Gen12 platforms
Rationale: Otherwise media driver breaks eg. for ADL-P. Long term goal is
still to sunset the IOCTL even for integrated and require using mmap_offset.
- Reject caching/set_domain IOCTLs on discrete
Expected to become immutable property of the BO
- Disallow changing context parameters after first use on Gen12 and earlier
- Require setting context parameters at creation on platforms after Gen12
Rationale (for both): Allow less dynamic changes to the context to simplify
the implementation and avoid user shooting theirselves in the foot.
- Drop I915_CONTEXT_PARAM_RINGSIZE
Userspace PR for compute-driver has not been merged
- Drop I915_CONTEXT_PARAM_NO_ZEROMAP
Userspace PR for libdrm / Beignet was never landed
- Drop CONTEXT_CLONE API
Userspace PR for Mesa was never landed
- Drop getparam support for I915_CONTEXT_PARAM_ENGINES
Only existed for symmetry wrt. setparam, never used.
- Disallow bonding of virtual engines
Drop the prep work, no hardware has been released needing it.
- (Implicit) Disable gpu relocations
Media userspace was the last userspace to still use them. They
have converted so performance can be regained with an update.
Core Changes:
- Merge topic branch 'topic/i915-ttm-2021-06-11' (from Maarten)
- Merge topic branch 'topic/revid_steppings' (from Matt R)
- Merge topic branch 'topic/xehp-dg2-definitions-2021-07-21' (from Matt R)
- Backmerges drm-next (Rodrigo)
Driver Changes:
- Initial workarounds for ADL-P (Clint)
- Preliminary code for XeHP/DG2 (Stuart, Umesh, Matt R, Prathap, Ram,
Venkata, Akeem, Tvrtko, John, Lucas)
- Fix ADL-S DMA mask size to 39 bits (Tejas)
- Remove code for CNL (Lucas)
- Add ADL-P GuC/HuC firmwares (John)
- Update HuC to 7.9.3 for TGL/ADL-S/RKL (John)
- Fix -EDEADLK handling regression (Ville)
- Implement Wa_1508744258 for DG1 and Gen12 iGFX (Jose)
- Extend Wa_1406941453 to ADL-S (Jose)
- Drop unnecessary workarounds per stepping for SKL/BXT/ICL (Matt R)
- Use fuse info to enable SFC on Gen12 (Venkata)
- Unconditionally flush the pages on acquire on EHL/JSL (Matt A)
- Probe existence of backing struct pages upon userptr creation (Chris, Matt A)
- Add an intermediate GEM proto-context to delay real context creation (Jason)
- Implement SINGLE_TIMELINE with a syncobj (Jason)
- Set the watchdog timeout directly in intel_context_set_gem (Jason)
- Disallow userspace from creating contexts with too many engines (Jason)
- Revert "drm/i915/gem: Asynchronous cmdparser" (Jason)
- Revert "drm/i915: Propagate errors on awaiting already signaled fences" (Jason)
- Revert "drm/i915: Skip over MI_NOOP when parsing" (Jason)
- Revert "drm/i915: Shrink the GEM kmem_caches upon idling" (Daniel)
- Always let TTM handle object migration (Jason)
- Correct the locking and pin pattern for dma-buf (Thomas H, Michael R, Jason)
- Migrate to system at dma-buf attach time (Thomas, Michael R)
- MAJOR refactoring of the GuC backend code to allow for enabling on Gen11+
(Matt B, John, Michal Wa., Fernando, Daniele, Vinay)
- Update GuC firmware interface to v62.0.0 (John, Michal Wa., Matt B)
- Add GuCRC feature to hand over the control of HW RC6 to the GuC on
Gen12+ when GuC submission is enabled (Vinay, Sujaritha, Daniele,
John, Tvrtko)
- Use the correct IRQ during resume and eliminate DRM IRQ midlayer (Thomas Z)
- Add pipelined page migration and clearing (Chris, Thomas H)
- Use TTM for system memory on discrete (Thomas H)
- Implement object migration for display vs. dma-buf (Thomas H)
- Perform execbuffer object locking as a separate step (Thomas H)
- Add support for explicit L3BANK steering (Matt, Daniele)
- Remove duplicated call to ops->pread (Daniel)
- Fix pagefault disabling in the first execbuf slowpath (Daniel)
- Simplify userptr locking (Thomas H)
- Improvements to the GuC CTB code (Matt B, John)
- Make GT workaround upper bounds exclusive (Matt R)
- Check for nomodeset in i915_init() first (Daniel)
- Delete now unused gpu reloc code (Daniel)
- Document RFC plans for GuC submission, DRM scheduler and new parallel
submit uAPI (Matt B)
- Reintroduce buddy allocator this time with TTM (Matt A)
- Support forcing page size with LMEM (Matt A)
- Add i915_sched_engine to abstract a submission queue between backends (Matt B)
- Use accelerated move in TTM (Ram)
- Fix memory leaks from TTM backend (Thomas H)
- Introduce WW transaction helper (Thomas H)
- Improve debug Kconfig texts a bit (Daniel)
- Unify user object creation code (Jason)
- Use a table for i915_init/exit (Jason)
- Move slabs to module init/exit (Daniel)
- Remove now unused i915_globals (Daniel)
- Extract i915_module.c (Daniel)
- Consistently use adl-p/adl-s in WA comments (Jose)
- Finish INTEL_GEN and friends conversion (Lucas)
- Correct variable/function namings (Lucas)
- Code checker fixes (Wan, Matt A)
- Tracepoint improvements (Matt B)
- Kerneldoc improvements (Tvrtko, Jason, Matt A, Maarten)
- Selftest improvements (Chris, Matt A, Tejas, Thomas H, John, Matt B,
Rahul, Vinay)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YQ0JmYiXhGskNcrI@jlahtine-mobl.ger.corp.intel.com
Diffstat (limited to 'Documentation/gpu')
-rw-r--r-- | Documentation/gpu/i915.rst | 15 | ||||
-rw-r--r-- | Documentation/gpu/rfc/i915_parallel_execbuf.h | 122 | ||||
-rw-r--r-- | Documentation/gpu/rfc/i915_scheduler.rst | 148 | ||||
-rw-r--r-- | Documentation/gpu/rfc/index.rst | 4 |
4 files changed, 289 insertions, 0 deletions
diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 42ce0196930a..204ebdaadb45 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -422,9 +422,16 @@ Batchbuffer Parsing User Batchbuffer Execution -------------------------- +.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_context_types.h + .. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c :doc: User command execution +Scheduling +---------- +.. kernel-doc:: drivers/gpu/drm/i915/i915_scheduler_types.h + :functions: i915_sched_engine + Logical Rings, Logical Ring Contexts and Execlists -------------------------------------------------- @@ -518,6 +525,14 @@ GuC-based command submission .. kernel-doc:: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c :doc: GuC-based command submission +GuC ABI +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h +.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h +.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h +.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h + HuC --- .. kernel-doc:: drivers/gpu/drm/i915/gt/uc/intel_huc.c diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h b/Documentation/gpu/rfc/i915_parallel_execbuf.h new file mode 100644 index 000000000000..8cbe2c4e0172 --- /dev/null +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h @@ -0,0 +1,122 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ + +/** + * struct drm_i915_context_engines_parallel_submit - Configure engine for + * parallel submission. + * + * Setup a slot in the context engine map to allow multiple BBs to be submitted + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU + * in parallel. Multiple hardware contexts are created internally in the i915 + * run these BBs. Once a slot is configured for N BBs only N BBs can be + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how + * many BBs there are based on the slot's configuration. The N BBs are the last + * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set. + * + * The default placement behavior is to create implicit bonds between each + * context if each context maps to more than 1 physical engine (e.g. context is + * a virtual engine). Also we only allow contexts of same engine class and these + * contexts must be in logically contiguous order. Examples of the placement + * behavior described below. Lastly, the default is to not allow BBs to + * preempted mid BB rather insert coordinated preemption on all hardware + * contexts between each set of BBs. Flags may be added in the future to change + * both of these default behaviors. + * + * Returns -EINVAL if hardware context placement configuration is invalid or if + * the placement configuration isn't supported on the platform / submission + * interface. + * Returns -ENODEV if extension isn't supported on the platform / submission + * interface. + * + * .. code-block:: none + * + * Example 1 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=1, + * engines=CS[0],CS[1]) + * + * Results in the following valid placement: + * CS[0], CS[1] + * + * Example 2 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=2, + * engines=CS[0],CS[2],CS[1],CS[3]) + * + * Results in the following valid placements: + * CS[0], CS[1] + * CS[2], CS[3] + * + * This can also be thought of as 2 virtual engines described by 2-D array + * in the engines the field with bonds placed between each index of the + * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to + * CS[3]. + * VE[0] = CS[0], CS[2] + * VE[1] = CS[1], CS[3] + * + * Example 3 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=2, + * engines=CS[0],CS[1],CS[1],CS[3]) + * + * Results in the following valid and invalid placements: + * CS[0], CS[1] + * CS[1], CS[3] - Not logical contiguous, return -EINVAL + */ +struct drm_i915_context_engines_parallel_submit { + /** + * @base: base user extension. + */ + struct i915_user_extension base; + + /** + * @engine_index: slot for parallel engine + */ + __u16 engine_index; + + /** + * @width: number of contexts per parallel engine + */ + __u16 width; + + /** + * @num_siblings: number of siblings per context + */ + __u16 num_siblings; + + /** + * @mbz16: reserved for future use; must be zero + */ + __u16 mbz16; + + /** + * @flags: all undefined flags must be zero, currently not defined flags + */ + __u64 flags; + + /** + * @mbz64: reserved for future use; must be zero + */ + __u64 mbz64[3]; + + /** + * @engines: 2-d array of engine instances to configure parallel engine + * + * length = width (i) * num_siblings (j) + * index = j + i * num_siblings + */ + struct i915_engine_class_instance engines[0]; + +} __packed; + diff --git a/Documentation/gpu/rfc/i915_scheduler.rst b/Documentation/gpu/rfc/i915_scheduler.rst new file mode 100644 index 000000000000..cbda75065dad --- /dev/null +++ b/Documentation/gpu/rfc/i915_scheduler.rst @@ -0,0 +1,148 @@ +========================================= +I915 GuC Submission/DRM Scheduler Section +========================================= + +Upstream plan +============= +For upstream the overall plan for landing GuC submission and integrating the +i915 with the DRM scheduler is: + +* Merge basic GuC submission + * Basic submission support for all gen11+ platforms + * Not enabled by default on any current platforms but can be enabled via + modparam enable_guc + * Lots of rework will need to be done to integrate with DRM scheduler so + no need to nit pick everything in the code, it just should be + functional, no major coding style / layering errors, and not regress + execlists + * Update IGTs / selftests as needed to work with GuC submission + * Enable CI on supported platforms for a baseline + * Rework / get CI heathly for GuC submission in place as needed +* Merge new parallel submission uAPI + * Bonding uAPI completely incompatible with GuC submission, plus it has + severe design issues in general, which is why we want to retire it no + matter what + * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step + which configures a slot with N contexts + * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to + a slot in a single execbuf IOCTL and the batches run on the GPU in + paralllel + * Initially only for GuC submission but execlists can be supported if + needed +* Convert the i915 to use the DRM scheduler + * GuC submission backend fully integrated with DRM scheduler + * All request queues removed from backend (e.g. all backpressure + handled in DRM scheduler) + * Resets / cancels hook in DRM scheduler + * Watchdog hooks into DRM scheduler + * Lots of complexity of the GuC backend can be pulled out once + integrated with DRM scheduler (e.g. state machine gets + simplier, locking gets simplier, etc...) + * Execlists backend will minimum required to hook in the DRM scheduler + * Legacy interface + * Features like timeslicing / preemption / virtual engines would + be difficult to integrate with the DRM scheduler and these + features are not required for GuC submission as the GuC does + these things for us + * ROI low on fully integrating into DRM scheduler + * Fully integrating would add lots of complexity to DRM + scheduler + * Port i915 priority inheritance / boosting feature in DRM scheduler + * Used for i915 page flip, may be useful to other DRM drivers as + well + * Will be an optional feature in the DRM scheduler + * Remove in-order completion assumptions from DRM scheduler + * Even when using the DRM scheduler the backends will handle + preemption, timeslicing, etc... so it is possible for jobs to + finish out of order + * Pull out i915 priority levels and use DRM priority levels + * Optimize DRM scheduler as needed + +TODOs for GuC submission upstream +================================= + +* Need an update to GuC firmware / i915 to enable error state capture +* Open source tool to decode GuC logs +* Public GuC spec + +New uAPI for basic GuC submission +================================= +No major changes are required to the uAPI for basic GuC submission. The only +change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. +This attribute indicates the 2k i915 user priority levels are statically mapped +into 3 levels as follows: + +* -1k to -1 Low priority +* 0 Medium priority +* 1 to 1k High priority + +This is needed because the GuC only has 4 priority bands. The highest priority +band is reserved with the kernel. This aligns with the DRM scheduler priority +levels too. + +Spec references: +---------------- +* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt +* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority +* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t + +New parallel submission uAPI +============================ +The existing bonding uAPI is completely broken with GuC submission because +whether a submission is a single context submit or parallel submit isn't known +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple +contexts in parallel with the GuC the context must be explicitly registered with +N contexts and all N contexts must be submitted in a single command to the GuC. +The GuC interfaces do not support dynamically changing between N contexts as the +bonding uAPI does. Hence the need for a new parallel submission interface. Also +the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore +I915_SUBMIT_FENCE is by design a future fence, so not really something we should +continue to support. + +The new parallel submission uAPI consists of 3 parts: + +* Export engines logical mapping +* A 'set_parallel' extension to configure contexts for parallel + submission +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL + +Export engines logical mapping +------------------------------ +Certain use cases require BBs to be placed on engine instances in logical order +(e.g. split-frame on gen11+). The logical mapping of engine instances can change +based on fusing. Rather than making UMDs be aware of fusing, simply expose the +logical mapping with the existing query engine info IOCTL. Also the GuC +submission interface currently only supports submitting multiple contexts to +engines in logical order which is a new requirement compared to execlists. +Lastly, all current platforms have at most 2 engine instances and the logical +order is the same as uAPI order. This will change on platforms with more than 2 +engine instances. + +A single bit will be added to drm_i915_engine_info.flags indicating that the +logical instance has been returned and a new field, +drm_i915_engine_info.logical_instance, returns the logical instance. + +A 'set_parallel' extension to configure contexts for parallel submission +------------------------------------------------------------------------ +The 'set_parallel' extension configures a slot for parallel submission of N BBs. +It is a setup step that must be called before using any of the contexts. See +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for +similar existing examples. Once a slot is configured for parallel submission the +execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only +supports GuC submission. Execlists supports can be added later if needed. + +Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and +drm_i915_context_engines_parallel_submit to the uAPI to implement this +extension. + +.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h + :functions: drm_i915_context_engines_parallel_submit + +Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL +------------------------------------------------------------------- +Contexts that have been configured with the 'set_parallel' extension can only +submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects +in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is +set. The number of BBs is implicit based on the slot submitted and how it has +been configured by 'set_parallel' or other extensions. No uAPI changes are +required to the execbuf2 IOCTL. diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 05670442ca1b..91e93a705230 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -19,3 +19,7 @@ host such documentation: .. toctree:: i915_gem_lmem.rst + +.. toctree:: + + i915_scheduler.rst |