Age | Commit message (Collapse) | Author |
|
Make the sunrpc trace subsystem trace events easier to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit 156708adf2d9 ("SUNRPC: Move the svc_xdr_recvfrom()
tracepoint") tried to capture the correct XID in the trace record,
but this line in svc_recv:
rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]);
alters the size of rq_arg.head[0].iov_len. The tracepoint records
the correct XID but an incorrect value for the length of the
xdr_buf's head.
To keep the trace callsites simple, I've created two trace classes.
One assumes the xdr_buf contains a full RPC message, and the XID
can be extracted from it. The other assumes the contents of the
xdr_buf are arbitrary, and the xid will be provided by the caller.
Currently there is only one user of each class, but I expect we will
need a few more tracepoints using each class as time goes on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit c509f15a5801 ("SUNRPC: Split the xdr_buf event class") added
display of the rqst's XID to the svc_xdr_buf_class. However, when
the recvfrom tracepoint fires, rq_xid has yet to be filled in with
the current XID. So it ends up recording the previous XID that was
handled by that svc_rqst.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Backchannel rpc tasks don't have task->tk_client set, so it's necessary
to check it for NULL before dereferencing.
Fixes: c509f15a5801 ("SUNRPC: Split the xdr_buf event class")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The TP_fast_assign() section is careful enough not to dereference
xdr->rqst if it's NULL. The TP_STRUCT__entry section is not.
Fixes: 5582863f450c ("SUNRPC: Add XDR overflow trace event")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Remove redundant call sites or call sites that are already covered
by tracepoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Remove several redundant dprintk call sites, and replace a couple of
potentially useful ones with tracepoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In many cases, tracepoints already report these errors. In others,
the dprintks were mainly useful when this code was less mature.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In many cases, tracepoints already report these errors. In others,
the dprintks were mainly useful when this code was less mature.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Time to remove dprintk call sites in here.
Regarding the rpc_bind_status tracepoint: It's friendlier to
administrators if they don't have to look up the error code to
figure out what went wrong. Replace trace_rpc_bind_status with a
set of tracepoints that report more specifically what the problem
was, and what RPC program/version was being queried.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: Replace dprintk call sites.
Note that rpc_call_rpcerror() already has a trace point, so perhaps
adding trace_rpc_refresh_status() isn't necessary. However, it does
report a particular category of error.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
For a long while we've wanted a tracepoint that fires when a major
timeout is reported in the system log. Such a tracepoint can be
attached to other actions that can take place when a timeout is
detected (eg, server or connection health assessment).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This trace event can be used to audit transport connections from the
client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
"no socket space" is an exceptional and infrequent condition
that troubleshooters want to know about.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Generate a trace event when an RPC request is queued without being
sent immediately.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Replace a dprintk() with a tracepoint. The tracepoint marks the
point where an RPC request is assigned an XID.
Additional clean up: Remove trace_xprt_enq_xmit, which reports much
the same thing. That tracepoint was added for debugging commit
918f3c1fe83c ("SUNRPC: Improve latency for interactive tasks").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Introduce a tracepoint in call_allocate that reports the exact
sizes in the RPC buffer allocation request and the status of the
result. This helps catch problems with XDR buffer provisioning,
and replaces transport-specific debugging instrumentation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Request completion is already recorded by an "rpc_task_wakeup
queue=xprt_pending" trace record. A subsequent rpc_xdr_recvfrom
trace record shows the number of bytes received.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Ensure that show_rqstp_flags() can recognize and display the
RQ_AUTHERR flag, added in commit 83dd59a0b9af ("SUNRPC/nfs: Fix
return value for nfs4_callback_compound()") and the RQ_DATA flag,
added in commit ff3ac5c3dc23 ("SUNRPC: Add a server side
per-connection limit").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull NFS client updates from Anna Schumaker:
"New features and improvements:
- Sunrpc receive buffer sizes only change when establishing a GSS credentials
- Add more sunrpc tracepoints
- Improve on tracepoints to capture internal NFS I/O errors
Other bugfixes and cleanups:
- Move a dprintk() to after a call to nfs_alloc_fattr()
- Fix off-by-one issues in rpc_ntop6
- Fix a few coccicheck warnings
- Use the correct SPDX license identifiers
- Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
- Replace zero-length array with flexible array
- Remove duplicate headers
- Set invalid blocks after NFSv4 writes to update space_used attribute
- Fix direct WRITE throughput regression"
* tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
NFS: Fix direct WRITE throughput regression
SUNRPC: rpc_xprt lifetime events should record xprt->state
xprtrdma: Make xprt_rdma_slot_table_entries static
nfs: set invalid blocks after NFSv4 writes
NFS: remove redundant initialization of variable result
sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs
NFS: Add a tracepoint in nfs_set_pgio_error()
NFS: Trace short NFS READs
NFS: nfs_xdr_status should record the procedure name
SUNRPC: Set SOFTCONN when destroying GSS contexts
SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
SUNRPC: trace RPC client lifetime events
SUNRPC: Trace transport lifetime events
SUNRPC: Split the xdr_buf event class
SUNRPC: Add tracepoint to rpc_call_rpcerror()
SUNRPC: Update the RPC_SHOW_SOCKET() macro
SUNRPC: Update the rpc_show_task_flags() macro
SUNRPC: Trace GSS context lifetimes
SUNRPC: receive buffer size estimation values almost never change
...
|
|
Help troubleshoot the logic that uses these flags.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The "create" tracepoint records parts of the rpc_create arguments,
and the shutdown tracepoint records when the rpc_clnt is about to
signal pending tasks and destroy auths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Refactor: Hoist create/destroy/disconnect tracepoints out of
xprtrdma and into the generic RPC client. Some benefits include:
- Enable tracing of xprt lifetime events for the socket transport
types
- Expose the different types of disconnect to help run down
issues with lingering connections
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
To help tie the recorded xdr_buf to a particular RPC transaction,
the client side version of this class should display task ID
information and the server side one should show the request's XID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Add a tracepoint in another common exit point for failing RPCs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: remove unnecessary commas, and fix a white-space nit.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Recent additions to the RPC_TASK flags neglected to update
the tracepoint ENUM definitions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
- Rename these so they are easy to enable and search for as a set
- Move the tracepoints to get a more accurate sense of control flow
- Tracepoints should not fire on xprt shutdown
- Display memory address in case data structure had been corrupted
- Abandon dprintk in these paths
I haven't ever gotten one of these tracepoints to trigger. I wonder
if we should simply remove them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: Add missing TRACE_DEFINE_ENUMs in
include/trace/events/sunrpc.h
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up. At this point, we are not ready yet to support bio_vecs in
the UDP transport implementation. However, we can clean up
svc_udp_recvfrom() to match the tracing and straight-lining recently
changes made in svc_tcp_recvfrom().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: move exception processing out of the main path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Refactor: svc_recvfrom() is going to be converted to read into
bio_vecs in a moment. Unhook the only other caller,
svc_tcp_recv_record(), which just wants to read the 4-byte stream
record marker into a kvec.
While we're in the area, streamline this helper by straight-lining
the hot path, replace dprintk call sites with tracepoints, and
reduce the number of atomic bit operations in this path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Report TCP socket state changes and accept failures via
tracepoints, replacing dprintk() call sites.
No tracepoint is added in svc_tcp_listen_data_ready. There's
no information available there that isn't also reported by the
svcsock_new_socket and the accept failure tracepoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In addition to tracing recently-updated socket sendto events, this
commit adds a trace event class that can be used for additional
svcsock-related tracepoints in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In lieu of dprintks or tracepoints in each individual transport
implementation, introduce tracepoints in the generic part of the RPC
layer. These typically fire for connection lifetime events, so
shouldn't contribute a lot of noise.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Capture transport creation failures.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: The xprt=%p was meant to distinguish events from different
transports, but the addr=%s does that just as well and does not
expose kernel memory addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Add basic tracing for debugging the sunrpc cache events.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This class can be used to create trace points in either the RPC
client or RPC server paths. It simply displays the length of each
part of an xdr_buf, which is useful to determine that the transport
and XDR codecs are operating correctly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull nfsd updates from Bruce Fields:
"This is a relatively quiet cycle for nfsd, mainly various bugfixes.
Possibly most interesting is Trond's fixes for some callback races
that were due to my incomplete understanding of rpc client shutdown.
Unfortunately at the last minute I've started noticing a new
intermittent failure to send callbacks. As the logic seems basically
correct, I'm leaving Trond's patches in for now, and hope to find a
fix in the next week so I don't have to revert those patches"
* tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
nfsd: depend on CRYPTO_MD5 for legacy client tracking
NFSD fixing possible null pointer derefering in copy offload
nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
nfsd: Ensure CLONE persists data and metadata changes to the target file
SUNRPC: Fix backchannel latency metrics
nfsd: restore NFSv3 ACL support
nfsd: v4 support requires CRYPTO_SHA256
nfsd: Fix cld_net->cn_tfm initialization
lockd: remove __KERNEL__ ifdefs
sunrpc: remove __KERNEL__ ifdefs
race in exportfs_decode_fh()
nfsd: Drop LIST_HEAD where the variable it declares is never used.
nfsd: document callback_wq serialization of callback code
nfsd: mark cb path down on unknown errors
nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
nfsd: minor 4.1 callback cleanup
SUNRPC: Fix svcauth_gss_proxy_init()
SUNRPC: Trace gssproxy upcall results
sunrpc: fix crash when cache_head become valid before update
nfsd: remove private bin2hex implementation
...
|
|
RPC tasks on the backchannel never invoke xprt_complete_rqst(), so
there is no way to report their tk_status at completion. Also, any
RPC task that exits via rpc_exit_task() before it is replied to will
also disappear without a trace.
Introduce a trace point that is symmetrical with rpc_task_begin that
captures the termination status of each RPC task.
Sample trace output for callback requests initiated on the server:
kworker/u8:12-448 [003] 127.025240: rpc_task_end: task:50@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
kworker/u8:12-448 [002] 127.567310: rpc_task_end: task:51@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
kworker/u8:12-448 [001] 130.506817: rpc_task_end: task:52@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
Odd, though, that I never see trace_rpc_task_complete, either in the
forward or backchannel. Should it be removed?
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Record results of a GSS proxy ACCEPT_SEC_CONTEXT upcall and the
svc_authenticate() function to make field debugging of NFS server
Kerberos issues easier.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bill Baker <bill.baker@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
To help debug problems with RPC/RDMA credit management, replace
dprintk() call sites in the transport send lock paths with trace
events.
Similar trace points are defined for the non-congestion paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Stable bugfixes:
- Fall back to MDS if no deviceid is found rather than aborting # v4.11+
- NFS4: Fix v4.0 client state corruption when mount
Features:
- Much improved handling of soft mounts with NFS v4.0:
- Reduce risk of false positive timeouts
- Faster failover of reads and writes after a timeout
- Added a "softerr" mount option to return ETIMEDOUT instead of
EIO to the application after a timeout
- Increase number of xprtrdma backchannel requests
- Add additional xprtrdma tracepoints
- Improved send completion batching for xprtrdma
Other bugfixes and cleanups:
- Return -EINVAL when NFS v4.2 is passed an invalid dedup mode
- Reduce usage of GFP_ATOMIC pages in SUNRPC
- Various minor NFS over RDMA cleanups and bugfixes
- Use the correct container namespace for upcalls
- Don't share superblocks between user namespaces
- Various other container fixes
- Make nfs_match_client() killable to prevent soft lockups
- Don't mark all open state for recovery when handling recallable
state revoked flag"
* tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (69 commits)
SUNRPC: Rebalance a kref in auth_gss.c
NFS: Fix a double unlock from nfs_match,get_client
nfs: pass the correct prototype to read_cache_page
NFSv4: don't mark all open state for recovery when handling recallable state revoked flag
SUNRPC: Fix an error code in gss_alloc_msg()
SUNRPC: task should be exit if encode return EKEYEXPIRED more times
NFS4: Fix v4.0 client state corruption when mount
PNFS fallback to MDS if no deviceid found
NFS: make nfs_match_client killable
lockd: Store the lockd client credential in struct nlm_host
NFS: When mounting, don't share filesystems between different user namespaces
NFS: Convert NFSv2 to use the container user namespace
NFSv4: Convert the NFS client idmapper to use the container user namespace
NFS: Convert NFSv3 to use the container user namespace
SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall
SUNRPC: Use the client user namespace when encoding creds
NFS: Store the credential of the mount process in the nfs_server
SUNRPC: Cache cred of process creating the rpc_client
xprtrdma: Remove stale comment
xprtrdma: Update comments that reference ib_drain_qp
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Allow state reset of printk_once() calls.
- Prevent crashes when dereferencing invalid pointers in vsprintf().
Only the first byte is checked for simplicity.
- Make vsprintf warnings consistent and inlined.
- Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
modifiers.
- Some clean up of vsprintf and test_printf code.
* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
lib/vsprintf: Make function pointer_string static
vsprintf: Limit the length of inlined error messages
vsprintf: Avoid confusion between invalid address and value
vsprintf: Prevent crash when dereferencing invalid pointers
vsprintf: Consolidate handling of unknown pointer specifiers
vsprintf: Factor out %pO handler as kobject_string()
vsprintf: Factor out %pV handler as va_format()
vsprintf: Factor out %p[iI] handler as ip_addr_string()
vsprintf: Do not check address of well-known strings
vsprintf: Consistent %pK handling for kptr_restrict == 0
vsprintf: Shuffle restricted_pointer()
printk: Tie printk_once / printk_deferred_once into .data.once for reset
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
lib/test_printf: Switch to bitmap_zalloc()
|
|
Add a helper to ensure that debugfs and friends print out the
correct current task timeout value.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|