aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-04IB/uverbs: Introduce egress flow steeringBoris Pismenny
The egress flag indicates that this flow steering rule is for egress traffic. The scope of an egress rule is port-wide, meaning all packets originated from that port, which match the steering rule specification will be effected by this steering rule's action. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add action_handle flow steering specificationMatan Barak
Binding a flow_action to flow steering rule requires using a new specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow specification. Flow steering rules could use flow_action(s) and as of that we need to avoid deleting flow_action(s) as long as they're being used. Moreover, when the attached rules are deleted, action_handle reference count should be decremented. Introducing a new mechanism of flow resources to keep track on the attached action_handle(s). Later on, this mechanism should be extended to other attached flow steering resources like flow counters. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add flow_action create and destroy verbsMatan Barak
A verbs application may receive and transmits packets using a data path pipeline. Sometimes, the first stage in the receive pipeline or the last stage in the transmit pipeline involves transforming a packet, either in order to make it easier for later stages to process it or to prepare it for transmission over the wire. Such transformation could be stripping/encapsulating the packet (i.e. vxlan), decrypting/encrypting it (i.e. ipsec), altering headers, doing some complex FPGA changes, etc. Some hardware could do such transformations without software data path intervention at all. The flow steering API supports steering a packet (either to a QP or dropping it) and some simple packet immutable actions (i.e. tagging a packet). Complex actions, that may change the packet, could bloat the flow steering API extensively. Sometimes the same action should be applied to several flows. In this case, it's easier to bind several flows to the same action and modify it than change all matching flows. Introducing a new flow_action object that abstracts any packet transformation (out of a standard and well defined set of actions). This flow_action object could be tied to a flow steering rule via a new specification. Currently, we support esp flow_action, which encrypts or decrypts a packet according to the given parameters. However, we present a flexible schema that could be used to other transformation actions tied to flow rules. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Refactor kern_spec_to_ib_spec_filterMatan Barak
The current implementation of kern_spec_to_ib_spec_filter, which takes a uAPI based flow steering specification and creates the respective kernel API flow steering structure, gets a ib_uverbs_flow_spec structure. The new flow_action uAPI gets a match mask and filter from user-space which aren't encoded in the flow steering's ib_uverbs_flow_spec structure. Exporting the logic out of kern_spec_to_ib_spec_filter to get user-space blobs rather than ib_uverbs_flow_spec structure. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/mlx4: Check for egress flow steeringBoris Pismenny
ConnectX3 doesn't support egress flow steering. Return an EOPNOTSUPP error when such a flow is being created. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add enum attribute type to ioctl() interfaceMatan Barak
Methods sometimes need to get one attribute out of a group of pre-defined attributes. This is an enum-like behavior. Since this is a common requirement, we add a new ENUM attribute to the generic uverbs ioctl() layer. This attribute is embedded in methods, like any other attributes we currently have. ENUM attributes point to an array of standard UVERBS_ATTR_PTR_IN. The user-space encodes the enum's attribute id in the id field and the internal PTR_IN attr id in the enum_data.elem_id field. This ENUM attribute could be shared by several attributes and it can get UVERBS_ATTR_SPEC_F_MANDATORY flag, stating this attribute must be supported by the kernel, like any other attribute. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/mlx5: Initialize the parsing tree root without the help of uverbsMatan Barak
In order to have a custom parsing tree, a provider driver needs to assign its parsing tree to ib_device specs_tree field. Otherwise, the uverbs client assigns a common default parsing tree for it. In downstream patches, the mlx5_ib driver gains a custom parsing tree, which contains both the common objects and a new flags field for the UVERBS_FLOW_ACTION_ESP_CREATE command. This patch makes mlx5_ib assign its own tree to specs_root, which later on will be extended. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA: Use ib_gid_attr during GID modificationParav Pandit
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/providers: Avoid null netdev check for RoCEParav Pandit
Now that IB core GID cache ensures that all RoCE entries have an associated netdev remove null checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/providers: Avoid zero GID check for RoCEParav Pandit
Now that the IB core GID cache ensures that a zero GID doesn't exist in the GID table remove zero GID checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Refactor GID modify code for RoCEParav Pandit
Code is refactored to prepare separate functions for RoCE which can do more complex operations related to reference counting, while still maintainining code readability. This includes (a) Simplification to not perform netdevice checks and modifications for IB link layer. (b) Do not add RoCE GID entry which has NULL netdevice; instead return an error. (c) If GID addition fails at provider level add_gid(), do not add the entry in the cache and keep the entry marked as INVALID. (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they can be used even for modifying default GIDs. This avoid some code duplication in modifying default GIDs. (e) find_gid() routine refers to the data entry flags to qualify a GID as valid or invalid GID rather than depending on attributes and zeroness of the GID content. (f) gid_table_reserve_default() sets the GID default attribute at beginning while setting up the GID table. There is no need to use default_gid flag in low level functions such as write_gid(), add_gid(), del_gid(), as they never need to update the DEFAULT property of the GID entry while during GID table update. As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer searchable as described below. A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB spec version 1.3 section 4.1.1, point (6) whose snippet is below. "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as the Reserved GID. It shall never be assigned to any endport. It shall not be used as a destination address or in a global routing header (GRH)." GID table cache now only stores valid GID entries. Before this patch, Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using ib_find_cached_gid_by_port() and other similar find routines. Zero GID is no longer searchable as it shall not to be present in GRH or path recored entry as described in IB spec version 1.3 section 4.1.1, point (6), section 12.7.10 and section 12.7.20. ib_cache_update() is simplified to check link layer once, use unified locking scheme for all link layers, removed temporary gid table allocation/free logic. Additionally, (a) Expand ib_gid_attr to store port and index so that GID query routines can get port and index information from the attribute structure. (b) Expand ib_gid_attr to store device as well so that in future code when GID reference counting is done, device is used to reach back to the GID table entry. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Simplify ib_query_gid to always refer to cacheParav Pandit
Currently following inconsistencies exist. 1. ib_query_gid() returns GID from the software cache for a RoCE port and returns GID from the HCA for an IB port. This is incorrect because software GID cache is maintained regardless of HCA port type. 2. GID is queries from the HCA via ib_query_gid and updated in the software cache for IB link layer. Both of them might not be in sync. ULPs such as SRP initiator, SRP target, IPoIB driver have historically used ib_query_gid() API to query the GID. However CM used cached version during CM processing, When software cache was introduced, this inconsitency remained. In order to simplify, improve readability and avoid link layer specific above inconsistencies, this patch brings following changes. 1. ib_query_gid() always refers to the cache layer regardless of link layer. 2. cache module who reads the GID entry from HCA and builds the cache, directly invokes the HCA provider verb's query_gid() callback function. 3. ib_query_port() is being called in early stage where GID cache is not yet build while reading port immutable property. Therefore it needs to read the default GID from the HCA for IB link layer to publish the subnet prefix. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/core: Update query_gid documentation for HCA driversParav Pandit
query_gid() should return right GID value for iWarp and IB link layers. It is a no-op for RoCE link layer. Update the documentation to reflect this. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA deviceRoland Dreier
Check to make sure that ctx->cm_id->device is set before we use it. Otherwise userspace can trigger a NULL dereference by doing RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device. Cc: <stable@vger.kernel.org> Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/qedr: Remove GID add/del dummy routinesParav Pandit
qedr driver's add_gid() and del_gid() callbacks are doing simple checks which are already done by the ib core before invoking these callback routines. Therefore, code is simplified to skip implementing add_gid() and del_gid() callback functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03i40iw: Remove pre-production workaround for resource profile 1Shiraz Saleem
Support for resource profile 1 is currenlty deprecated due to a pre-production errata. Remove this workaround as its no longer needed. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/mlx5: Fix definition of mlx5_ib_create_qp_respJason Gunthorpe
This structure is pushed down the ex and the non-ex path, so it needs to be aligned to 8 bytes to go through ex without implicit padding. Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so take the approach of just copying the minimum length. New user space will consistently provide 8 bytes in both cases. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/ocrdma_hw: Remove redundant checks and goto labelsGustavo A. R. Silva
Check on return values and goto label mbx_err are unnecessary. Addresses-Coverity-ID: 1271151 ("Identical code for different branches") Addresses-Coverity-ID: 1268788 ("Identical code for different branches") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/ipoib: Delete unused structYuval Shaia
This structure is not needed since the introduction of commit 'c42687784b9a ("IB/ipoib: Scatter-Gather support in connected mode")' Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29iw_cxgb4: print mapped ports correctlyBharat Potnuri
c4iw_ep_common structure holds the mapped addresses, so while printing them, use appropriate pointers. Fixes: bab572f1d ("iw_cxgb4: Guard against null cm_id in dump_ep/qp") Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29RDMA/cma: Move rdma_cm_state to cma_priv.hParav Pandit
rdma_cm_state enum is internal to rdma_cm kernel module. It is not required to expose state enums to ULP modules. So lets keep its scope limited to rdma_cm module in cma_priv.h file. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29IB/addr: Constify dst_entry pointerParav Pandit
Make dst_entry pointer as const struct dst_entry* to improve code readablity to make sure that dst structure fields are not modified by various functions which are using it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29RDMA: Use u64_to_user_ptr everywhereJason Gunthorpe
This is already used in many places, get the rest of them too, only to make the code a bit clearer & simpler. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29RDMA/nldev: Provide netdevice name and indexLeon Romanovsky
Export the net device name and index to easily find connection between IB devices and relevant net devices. We also updated the comment regarding the devices without FW. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29IB/rxe: optimize mcast recv processZhu Yanjun
In mcast recv process, the function skb_clone is used. In fact, the refcount can be increased to replace cloning a new skb since the original skb will not be modified before it is freed. This can make the performance better and save the memory. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29qedr: Fix spelling mistake: "hanlde" -> "handle"Colin Ian King
Trivial fix to spelling mistake in DP_ERR message text Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28RDMA/CMA: Add rdma_port_space to UAPISteve Wise
Since the rdma_port_space enum is being passed between user and kernel for user cm_id setup, we need it in a UAPI header. So add it to rdma_user_cm.h. This also fixes the cm_id restrack changes which pass up the port space value via the RDMA_NLDEV_ATTR_RES_PS attribute. Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28RDMA/CMA: remove RDMA_PS_SDPSteve Wise
This is no longer supported, so remove it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Refer to RoCE port property to decide building cacheParav Pandit
IB core maintains the GID cache entries for the GID table. This cache table has to be maintained regardless of HCA's support of GID table. For IB and iWarp ports, cache is created by querying the HCA. For RoCE cache is created based on netdev events. Therefore just refer to the RoCE port property of the {device, port} to decide whether to build cache by querying HCA or from netdev events. There is no need to check if HCA support GID table or not. ib_cache_update() referred to RoCE attribute before validating port. Though in all current callers port is valid, it is incorrect to query RoCE port property before validating the port. Therefore, rdma_protocol_roce() check is done after rdma_is_port_valid() verifies that port is valid. Fixes: 115b68aa6ea4 ("IB/ocrdma: Removed GID add/del null routines") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Search GID only for IB link layerParav Pandit
Even though API is only used by IPoIB driver, its incorrect to refer RoCE GID table property to search for GID. Look for only IB link layer to search for the GID. Fixes: dbb12562f7c2 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Refer to RoCE port property instead of GID table propertyParav Pandit
ib_find_gid_by_filter() searches GID with filter only for RoCE link layer regardless of HCA's support for GID table. Therefore, right way to lookup is compare RoCE port property and not the GID table property. Fixes: 99b27e3b5da0 ("IB/cache: Add ib_find_gid_by_filter cache API") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Generate GID change event regardless of RoCE GID table propertyParav Pandit
Due to following reasons, GID table event is generated regardless of GID table property. 1. GID table cache is maintained at ib core layer regardless of link layer. 2. GID change event has no relation with IB link layer. 3. GID change event also doesn't depend on whether HCA supports GID table or not. Fixes: f3906bd36087 ("IB/core: Refactor GID cache's ib_dispatch_event") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/cm: Block processing alternate path handling RoCE Rx cm messagesParav Pandit
Due to below reasons, it is better to not support alternate path receive messages for RoCE in near term. 1. Alternate path for RoCE is not supported at rdmacm layer. 2. It is not supported in uverbs/core layer for RoCE. 3. Alternate path for IPv6 for link local address cannot resolve route determinstically without a valid incoming interface id whose usecase make sense only with dual port mode. 4. init_av_from_path while processing LAP messages for IB and RoCE can lead to adding duplicate entry of AV into the port list, leads to list corruption. 5. rdma-core userspace a well known userspace implementation has removed support of libucm which use ucm.ko module, which is the only module that can trigger alternate path related messages. 6. ucm kernel module is requested to be removed from the IB core in patch [1]. [1] https://patchwork.kernel.org/patch/10268503/ Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Protect against concurrent access to hardware statsMark Bloch
Currently access to hardware stats buffer isn't protected, this can result in multiple writes and reads at the same time to the same memory location. This can lead to providing an incorrect value to the user. Add a mutex to protect against it. Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/mlx5: Respect new UMR capabilitiesMajd Dibbiny
In some firmware configuration, UMR usage from Virtual Functions is restricted. This information is published to the driver using new capability bits. Avoid using UMRs in these cases and use the Firmware slow-path flow to create mkeys and populate them with Virtual to Physical address translation. Older drivers that do not have this patch, will end up using memory keys that aren't populated with Virtual to Physical address translation that is done part of the UMR work. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/mlx5: Enable ECN capable bits for UD RoCE v2 QPsMajd Dibbiny
When working with RC QPs, the FW sets the ECN capable bits for all the RoCE v2 packets. On the other hand, for UD QPs, the driver needs to set the the ECN capable bits in the Address Handler since the HW generates each packet according to the Address Handler and not the QP context. If ECN is not enabled in NIC or switch, these bits are ignored. Fixes: 2811ba51b049 ("IB/mlx5: Add RoCE fields to Address Vector") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/uverbs: UAPI pointers should use __aligned_u64 typeMatan Barak
The ioctl() UAPIs are meant to be used by both user-space and kernel ioctl() handlers. Mostly, these UAPI structs tend to consist of simple types, but sometimes user-space pointers may be passed between user-space and kernel. We would like to avoid dereferencing a user-space pointer in the kernel, thus - we always define RDMA_UAPI_PTR as a __aligned_u64 type. Fixes: 1f7ff9d5d36a ('IB/uverbs: Move to new headers and make naming consistent') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27Merge branch '32compat'Jason Gunthorpe
The design of the uAPI had intended all structs to share the same layout on 32 and 64 bit compiles. Unfortunately over the years some errors have crept in. This series fixes all the incompatabilities. It goes along with a userspace rdma-core series that causes the providers to use these structs directly and then does various self-checks on the command formation. Those checks were combined with output from pahole on 32 and 64 bit compiles to confirm that the structure layouts are the same. This series does not make implicit padding explicit, as long as the implicit padding is the same on 32 and 64 bit compiles. Finally, the issue is put to rest by using __aligned_u64 in the uapi headers, if new code copies that type, and is checked in userspace, it is unlikely we will see problems in future. There are two patches that break the ABI for a 32 bit kernel, one for rxe and one for mlx4. Both patches have notes, but the overall feeling from Doug and I is that providing compat is just too difficult and not necessary since there is no real user of a 32 bit userspace and 32 bit kernel for various good reasons. The 32 bit userspace / 64 bit kernel case however does seem to have some real users and does need to work as designed. * 32compat: RDMA: Change all uapi headers to use __aligned_u64 instead of __u64 RDMA/rxe: Fix uABI structure layouts for 32/64 compat RDMA/mlx4: Fix uABI structure layouts for 32/64 compat RDMA/qedr: Fix uABI structure layouts for 32/64 compat RDMA/ucma: Fix uABI structure layouts for 32/64 compat RDMA: Remove minor pahole differences between 32/64
2018-03-27RDMA: Change all uapi headers to use __aligned_u64 instead of __u64Jason Gunthorpe
The new auditing standard for the subsystem will be to only use __aligned_64 in uapi headers to try and prevent 32/64 compat bugs from existing in the future. Changing all existing usage will help ensure new developers copy the right idea. The before/after of this patch was tested using pahole on 32 and 64 bit compiles to confirm it has no change in the structure layout, so this patch is a NOP. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/rxe: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
With 32 bit compilation several of the fields become misaligned here. Fixing this is an ABI break for 32 bit rxe and it is in well used portions of the rxe ABI. To handle this we bump the ABI version, as expected. However the user space driver doesn't handle it properly today, so all existing user space continues to work. Updated userspace will start to require the necessary kernel version. We don't expect there to be any 32 bit users of rxe. Most likely cases, such as ARM 32 already generally don't work because rxe does not handle the CPU cache properly on its shared with userspace pages. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/mlx4: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
rss_caps in struct mlx4_uverbs_ex_query_device_resp is misaligned on 32 bit compared to 64 bit, add explicit padding. The rss caps were introduced recently and are very rarely used in user space, mainly for DPDK. We don't expect there to be a real 32 bit user, so this change is done without compat considerations. Fixes: 09d208b258a2 ("IB/mlx4: Add report for RSS capabilities by vendor channel") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/qedr: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
struct qedr_alloc_ucontext_resp is a different length in 32 and 64 bit compiles due to implicit compiler padding. The structs alloc_pd_uresp, create_cq_uresp and create_qp_uresp are not padded by the compiler, but in user space the compiler pads them due to the way the core and driver structs are concatenated. Make this padding explicit and consistent for future sanity. The kernel driver can already handle the user buffer being smaller than required and copies correctly, so no compat or ABI break happens from introducing the explicit padding. Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/ucma: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles. The kernel requires it to be the expected length or longer so 32 bit builds running on a 64 bit kernel will not work. Retain full compat by having all kernels accept a struct with or without the trailing reserved field. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA: Remove minor pahole differences between 32/64Jason Gunthorpe
To help automatic detection we want pahole to report the same struct layouts for 32 and 64 bit compiles. These cases are all implicit padding added at the end of embedded structs as part of a union. The added reserved fields have no impact on the ABI. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23RDMA/ocrdma: Fix structure layout for ocrdma_alloc_pdJason Gunthorpe
The udata's for alloc_pd cannot contain u64s due to alignment constraints. Switch the two never-used u64's to arrays of u32 to reduce the required struct alignment to 4 bytes. These reserved fields are totally unnecessary, never written and never read. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23iw_cxgb4: Add ib_device->get_netdev supportSteve Wise
This is useful to rdma ULPs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23IB/cma: Resolve route only while receiving CM requestsParav Pandit
Currently CM request for RoCE follows following flow. rdma_create_id() rdma_resolve_addr() rdma_resolve_route() For RC QPs: rdma_connect() ->cma_connect_ib() ->ib_send_cm_req() ->cm_init_av_by_path() ->ib_init_ah_attr_from_path() For UD QPs: rdma_connect() ->cma_resolve_ib_udp() ->ib_send_cm_sidr_req() ->cm_init_av_by_path() ->ib_init_ah_attr_from_path() In both the flows, route is already resolved before sending CM requests. Therefore, code is refactored to avoid resolving route second time in ib_cm layer. ib_init_ah_attr_from_path() is extended to resolve route when it is not yet resolved for RoCE link layer. This is achieved by caller setting route_resolved field in path record whenever it has route already resolved. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22IB/core: Refer to RoCE port property instead of GID table propertyParav Pandit
ib_query_gid() in commit [1] refers to RoCE GID table capability of the HCA using rdma_cap_roce_gid_table(). ib_core maintains the GID table cache regardless of the HCA provider drivers capability to maintain RoCE GID table. Therefore, whether to return a GID table entry from the software cache or from HCA should be done based on whether the port is RoCE or not. [1] commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22RDMA/restrack: Remove ambiguity in resource track clean logicLeon Romanovsky
The restrack clean routine had simple, but powerful WARN_ON check to see if all resources are cleared prior to releasing device. The WARN_ON check performed very well, but lack of information which device caused to resource leak, the object type and origin made debug to be fun and challenging at the same time. The fact that all dumps were the same because restrack_clean() is called in dealloc() didn't help either. So let's fix spelling error and convert WARN_ON to be more debug friendly. The dmesg cut below gives example of how the output will look output for the case fixed in patch [1] [ 438.421372] restrack: ------------[ cut here ]------------ [ 438.423448] restrack: BUG: RESTRACK detected leak of resources on mlx5_2 [ 438.425600] restrack: Kernel PD object allocated by mlx5_ib is not freed [ 438.427753] restrack: Kernel CQ object allocated by mlx5_ib is not freed [ 438.429660] restrack: ------------[ cut here ]------------ [1] https://patchwork.kernel.org/patch/10298695/ Cc: Michal Kalderon <Michal.Kalderon@cavium.com> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>