Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
Conflicts:
net/netfilter/nf_conntrack_netlink.c
|
|
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
drivers/net/hp-plus.c
drivers/net/wireless/ath5k/base.c
drivers/net/wireless/ath9k/recv.c
net/wireless/reg.c
|
|
Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of using one atomic_t per protocol, use a percpu_counter
for "sockets_allocated", to reduce cache line contention on
heavy duty network servers.
Note : We revert commit (248969ae31e1b3276fc4399d67ce29a5d81e6fd9
net: af_unix can make unix_nr_socks visbile in /proc),
since it is not anymore used after sock_prot_inuse_add() addition
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()
[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.
Take it from socket or netdevice. Stub DECnet to init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To avoid unnecessary complications with passing netns around.
* set once, very early after allocating
* once set, never changes
For a while create every xfrm_state in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Impact: Optimization
Like done in inet_unhash(), we can avoid taking a chain lock if
socket is not hashed in udp_unhash()
Triggered by close(socket(AF_INET, SOCK_DGRAM, 0));
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch let nfmark to be evaluated for routing decision for OUTPUT
packet, in mangle table, when process paquet in NFQUEUE
Until now, only change (in NFQUEUE process) on fields src_addr,
dest_addr and tos could make netfilter to reevalute the routing.
From: Laurent Licour <laurent@licour.com>
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct will be kfreed in a moment, so...
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The earlier version was just very basic one which is "playing
safe" by always clearing the hints. However, clearing of a hint
is extremely costly operation with large windows, so it must be
avoided at all cost whenever possible, there is a way with
shifting too achieve not-clearing.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.
This approach has a number of benefits:
1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
of TCP has to be aware of this (this was not the case with
some other approach that was, well, quite intrusive all
around).
4) Clean_rtx_queue can release most of the pages using single
put_page instead of previous PAGE_SIZE/mss+1 calls
In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.
TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).
I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
1) ambiguous => no sensible measurement can be taken anyway
2) non-ambiguous is due to reordering => having the timestamp
of the last segment there is just skewing things more off
than does some good since the ack got triggered by one of
the holes (besides some substle issues that would make
determining right hole/skb even harder problem). Anyway,
it has nothing to do with this change then.
I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.
In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.
Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.
TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)
My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...
[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]
...To counter that, I gave up on the fifth advantage:
5) When growing previous SACK block, less allocs for new skbs
are done, basically a new alloc is needed only when new hole
is detected and when the previous skb runs out of frags space
...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.
With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:
TCPSackShifted 398
TCPSackMerged 877
TCPSackShiftFallback 320
TCPSACKCOLLAPSEFALLBACKGSO 0
TCPSACKCOLLAPSEFALLBACKSKBBITS 0
TCPSACKCOLLAPSEFALLBACKSKBDATA 0
TCPSACKCOLLAPSEFALLBACKBELOW 0
TCPSACKCOLLAPSEFALLBACKFIRST 1
TCPSACKCOLLAPSEFALLBACKPREVBITS 318
TCPSACKCOLLAPSEFALLBACKMSS 1
TCPSACKCOLLAPSEFALLBACKNOHEAD 0
TCPSACKCOLLAPSEFALLBACKSHIFT 0
TCPSACKCOLLAPSENOOPSEQ 0
TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
TCPSACKCOLLAPSENOOPSMALLLEN 0
TCPSACKCOLLAPSEHOLE 12
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is preparatory work for SACK combiner patch which may
have to count TCP state changes for only a part of the skb
because it will intentionally avoids splitting skb to SACKed
and not sacked parts.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sadly enough, this adds possible divide though we try to avoid
it by checking one mss as common case.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I knew already when rewriting the sacktag that this condition
was too conservative, change it now since it prevent lot of
useless work (especially in the sack shifter decision code
that is being added by a later patch). This shouldn't change
anything really, just save some processing regardless of the
shifter.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I always had thought that collapsing up to two at a time was
intentional decision to avoid excessive processing if 1 byte
sized skbs are to be combined for a full mtu, and consecutive
retransmissions would make the size of the retransmittee
double each round anyway, but some recent discussion made me
to understand that was not the case. Thus make collapse work
more and wait less.
It would be possible to take advantage of the shifting
machinery (added in the later patch) in the case of paged
data but that can be implemented on top of this change.
tcp_skb_is_last check is now provided by the loop.
I tested a bit (ss-after-idle-off, fill 4096x4096B xfer,
10s sleep + 4096 x 1byte writes while dropping them for
some a while with netem):
. 16774097:16775545(1448) ack 1 win 46
. 16775545:16776993(1448) ack 1 win 46
. ack 16759617 win 2399
P 16776993:16777217(224) ack 1 win 46
. ack 16762513 win 2399
. ack 16765409 win 2399
. ack 16768305 win 2399
. ack 16771201 win 2399
. ack 16774097 win 2399
. ack 16776993 win 2399
. ack 16777217 win 2399
P 16777217:16777257(40) ack 1 win 46
. ack 16777257 win 2399
P 16777257:16778705(1448) ack 1 win 46
P 16778705:16780153(1448) ack 1 win 46
FP 16780153:16781313(1160) ack 1 win 46
. ack 16778705 win 2399
. ack 16780153 win 2399
F 1:1(0) ack 16781314 win 2399
While without drop-all period I get this:
. 16773585:16775033(1448) ack 1 win 46
. ack 16764897 win 9367
. ack 16767793 win 9367
. ack 16770689 win 9367
. ack 16773585 win 9367
. 16775033:16776481(1448) ack 1 win 46
P 16776481:16777217(736) ack 1 win 46
. ack 16776481 win 9367
. ack 16777217 win 9367
P 16777217:16777218(1) ack 1 win 46
P 16777218:16777219(1) ack 1 win 46
P 16777219:16777220(1) ack 1 win 46
...
P 16777247:16777248(1) ack 1 win 46
. ack 16777218 win 9367
. ack 16777219 win 9367
...
. ack 16777233 win 9367
. ack 16777248 win 9367
P 16777248:16778696(1448) ack 1 win 46
P 16778696:16780144(1448) ack 1 win 46
FP 16780144:16781313(1169) ack 1 win 46
. ack 16780144 win 9367
F 1:1(0) ack 16781314 win 9367
The window seems to be 30-40 segments, which were successfully
combined into: P 16777217:16777257(40) ack 1 win 46
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.
This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.
This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.
This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.
This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The rule of calling sock_prot_inuse_add() is that BHs must
be disabled. Some new calls were added where this was not
true and this tiggers warnings as reported by Ilpo.
Fix this by adding explicit BH disabling around those call sites,
or moving sock_prot_inuse_add() call inside an existing BH disabled
section.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_net_set() should be the very first thing after alloc_netdev().
"ndo_" changes turned simple assignment (which is OK to do before netns
assignment) into quite non-trivial operation (which is not OK, init_net was
used). This leads to incomplete initialisation of tunnel device in netns.
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f
*pde = 00000000
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/lo/operstate
Pid: 10, comm: netns Not tainted (2.6.28-rc6 #1)
EIP: 0060:[<c02efdb5>] EFLAGS: 00010246 CPU: 0
EIP is at ip6_tnl_exit_net+0x37/0x4f
EAX: 00000000 EBX: 00000020 ECX: 00000000 EDX: 00000003
ESI: c5caef30 EDI: c782bbe8 EBP: c7909f50 ESP: c7909f48
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process netns (pid: 10, ti=c7908000 task=c7905780 task.ti=c7908000)
Stack:
c03e75e0 c7390bc8 c7909f60 c0245448 c7390bd8 c7390bf0 c7909fa8 c012577a
00000000 00000002 00000000 c0125736 c782bbe8 c7909f90 c0308fe3 c782bc04
c7390bd4 c0245406 c084b718 c04f0770 c03ad785 c782bbe8 c782bc04 c782bc0c
Call Trace:
[<c0245448>] ? cleanup_net+0x42/0x82
[<c012577a>] ? run_workqueue+0xd6/0x1ae
[<c0125736>] ? run_workqueue+0x92/0x1ae
[<c0308fe3>] ? schedule+0x275/0x285
[<c0245406>] ? cleanup_net+0x0/0x82
[<c0125ae1>] ? worker_thread+0x81/0x8d
[<c0128344>] ? autoremove_wake_function+0x0/0x33
[<c0125a60>] ? worker_thread+0x0/0x8d
[<c012815c>] ? kthread+0x39/0x5e
[<c0128123>] ? kthread+0x0/0x5e
[<c0103b9f>] ? kernel_thread_helper+0x7/0x10
Code: db e8 05 ff ff ff 89 c6 e8 dc 04 f6 ff eb 08 8b 40 04 e8 38 89 f5 ff 8b 44 9e 04 85 c0 75 f0 43 83 fb 20 75 f2 8b 86 84 00 00 00 <8b> 40 04 e8 1c 89 f5 ff e8 98 04 f6 ff 89 f0 e8 f8 63 e6 ff 5b
EIP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f SS:ESP 0068:c7909f48
---[ end trace 6c2f2328fccd3e0c ]---
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.
The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.
We define a large value :
#define LISTENING_NULLS_BASE (1U << 29)
So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1) Use eq_net() in inet_netns_ok() to speedup socket creation if
!CONFIG_NET_NS
2) Reorder the tests about inet_ehash_secret generation (once only)
Use the unlikely() macro when testing if inet_ehash_secret already
generated.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
Since most (if not all) implementations of TSO and even the in-kernel
software GSO do not update the urgent pointer when splitting a large
segment, it is necessary to turn off TSO/GSO for all outgoing traffic
with the URG pointer set.
Looking at tcp_current_mss (and the preceding comment) I even think
this was the original intention. However, this approach is insufficient,
because TSO/GSO is turned off only for newly created frames, not for
frames which were already pending at the arrival of a message with
MSG_OOB set. These frames were created when TSO/GSO was enabled,
so they may be large, and they will have the urgent pointer set
in tcp_transmit_skb().
With this patch, such large packets will be fragmented again before
going to the transmit routine.
As a side note, at least the following NICs are known to screw up
the urgent pointer in the TCP header when doing TSO:
Intel 82566MM (PCI ID 8086:1049)
Intel 82566DC (PCI ID 8086:104b)
Intel 82541GI (PCI ID 8086:1076)
Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
They are no longer a rwlocks.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.
/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.
This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert ipgre tunnel to netdevice ops.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to network device ops. Needed to change to directly call
the init routine since two sides share same ops. In the process
found by inspection a device ref count leak if register_netdevice failed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to new network device ops interface.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
drivers/net/ixgbe/ixgbe_main.c
include/net/mac80211.h
net/phonet/af_phonet.c
|
|
Fixes sparse warnings:
net/ipv4/ip_sockglue.c:146:15: warning: incorrect type in assignment (different base types)
net/ipv4/ip_sockglue.c:146:15: expected restricted __be16 [assigned] [usertype] sin_port
net/ipv4/ip_sockglue.c:146:15: got unsigned short [unsigned] [short] [usertype] <noident>
net/ipv4/ip_sockglue.c:130:6: warning: symbol 'ip_cmsg_recv_dstaddr' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_sk_rebuild_header() does a new route lookup if the dst_entry
associated with a socket becomes stale. However inet_sk_rebuild_header()
didn't use struct flowi->flags, causing the route lookup to
fail for foreign-bound IP_TRANSPARENT sockets, causing an error
state to be set for the sockets in question.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
udp_sendmsg() didn't fill struct flowi->flags, which means that
the route lookup would fail for non-local IPs even if the
IP_TRANSPARENT sockopt was set.
This prevents sendto() to work properly for UDP sockets, whereas
bind(foreign-ip) + connect() + send() worked fine.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.
listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.
This should reduce hold time of readers, and writers concurrency.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to net_device_ops function table pointer for ioctl.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The first argument to csum_partial is const void *
casts to char/u8 * are not necessary
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similarly to IPv6 ip6_mr_init() (fixed last week), the order of cleanup
operations in the error/exit section of ip_mr_init() is completely
inversed. It should be the other way around.
Also a del_timer() is missing in the error path.
I should have guessed last week that this same error existed in ipmr.c
too, as ip6mr.c is largely inspired by ipmr.c.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
drivers/isdn/i4l/isdn_net.c
fs/cifs/connect.c
|
|
RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.
This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.
Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.
__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)
Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a straightforward patch, using hlist_nulls infrastructure.
RCUification already done on UDP two weeks ago.
Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.
Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case UDP traffic is redirected to a local UDP socket,
the originally addressed destination address/port
cannot be recovered with the in-kernel tproxy.
This patch adds an IP_RECVORIGDSTADDR sockopt that enables
a IP_ORIGDSTADDR ancillary message in recvmsg(). This
ancillary message contains the original destination address/port
of the packet being received.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans. My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan. I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?
The attached patch makes it work much better for me. The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request. The old code
*would* create a stale entry even if we are not going to respond.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|