aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2012-01-22tcp: md5: using remote adress for md5 lookup in rst packetshawnlu
md5 key is added in socket through remote address. remote address should be used in finding md5 key when sending out reset packet. Signed-off-by: shawnlu <shawn.lu@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22tcp: detect loss above high_seq in recoveryYuchung Cheng
Correctly implement a loss detection heuristic: New sequences (above high_seq) sent during the fast recovery are deemed lost when higher sequences are SACKed. Current code does not catch these losses, because tcp_mark_head_lost() does not check packets beyond high_seq. The fix is straight-forward by checking packets until the highest sacked packet. In addition, all the FLAG_DATA_LOST logic are in-effective and redundant and can be removed. Update the loss heuristic comments. The algorithm above is documented as heuristic B, but it is redundant too because heuristic A already covers B. Note that this change only marks some forward-retransmitted packets LOST. It does NOT forbid TCP performing further CWR on new losses. A potential follow-up patch under preparation is to perform another CWR on "new" losses such as 1) sequence above high_seq is lost (by resetting high_seq to snd_nxt) 2) retransmission is lost. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-20tcp: fix undo after RTO for CUBICNeal Cardwell
This patch fixes CUBIC so that cwnd reductions made during RTOs can be undone (just as they already can be undone when using the default/Reno behavior). When undoing cwnd reductions, BIC-derived congestion control modules were restoring the cwnd from last_max_cwnd. There were two problems with using last_max_cwnd to restore a cwnd during undo: (a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss (by calling the module's reset() functions), so cwnd reductions from RTOs could not be undone. (b) when fast_covergence is enabled (which it is by default) last_max_cwnd does not actually hold the value of snd_cwnd before the loss; instead, it holds a scaled-down version of snd_cwnd. This patch makes the following changes: (1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as the existing comment notes, the "congestion window at last loss" (2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events (3) use ca->last_max_cwnd to check if we're in slow start Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Sangtae Ha <sangtae.ha@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-20tcp: fix undo after RTO for BICNeal Cardwell
This patch fixes BIC so that cwnd reductions made during RTOs can be undone (just as they already can be undone when using the default/Reno behavior). When undoing cwnd reductions, BIC-derived congestion control modules were restoring the cwnd from last_max_cwnd. There were two problems with using last_max_cwnd to restore a cwnd during undo: (a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss (by calling the module's reset() functions), so cwnd reductions from RTOs could not be undone. (b) when fast_covergence is enabled (which it is by default) last_max_cwnd does not actually hold the value of snd_cwnd before the loss; instead, it holds a scaled-down version of snd_cwnd. This patch makes the following changes: (1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as the existing comment notes, the "congestion window at last loss" (2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events (3) use ca->last_max_cwnd to check if we're in slow start Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) tg3: Fix single-vector MSI-X code openvswitch: Fix multipart datapath dumps. ipv6: fix per device IP snmp counters inetpeer: initialize ->redirect_genid in inet_getpeer() net: fix NULL-deref in WARN() in skb_gso_segment() net: WARN if skb_checksum_help() is called on skb requiring segmentation caif: Remove bad WARN_ON in caif_dev caif: Fix typo in Vendor/Product-ID for CAIF modems bnx2x: Disable AN KR work-around for BCM57810 bnx2x: Remove AutoGrEEEn for BCM84833 bnx2x: Remove 100Mb force speed for BCM84833 bnx2x: Fix PFC setting on BCM57840 bnx2x: Fix Super-Isolate mode for BCM84833 net: fix some sparse errors net: kill duplicate included header net: sh-eth: Fix build error by the value which is not defined net: Use device model to get driver name in skb_gso_segment() bridge: BH already disabled in br_fdb_cleanup() net: move sock_update_memcg outside of CONFIG_INET mwl8k: Fixing Sparse ENDIAN CHECK warning ...
2012-01-17inetpeer: initialize ->redirect_genid in inet_getpeer()Dan Carpenter
kmemcheck complains that ->redirect_genid doesn't get initialized. Presumably it should be set to zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17net: fix some sparse errorsEric Dumazet
make C=2 CF="-D__CHECK_ENDIAN__" M=net And fix flowi4_init_output() prototype for sport Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17net: kill duplicate included headerShan Wei
For net part, remove duplicate included header. Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-14Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: capabilities: remove __cap_full_set definition security: remove the security_netlink_recv hook as it is equivalent to capable() ptrace: do not audit capability check when outputing /proc/pid/stat capabilities: remove task_ns_* functions capabitlies: ns_capable can use the cap helpers rather than lsm call capabilities: style only - move capable below ns_capable capabilites: introduce new has_ns_capabilities_noaudit capabilities: call has_ns_capability from has_capability capabilities: remove all _real_ interfaces capabilities: introduce security_capable_noaudit capabilities: reverse arguments to security_capable capabilities: remove the task from capable LSM hook entirely selinux: sparse fix: fix several warnings in the security server cod selinux: sparse fix: fix warnings in netlink code selinux: sparse fix: eliminate warnings for selinuxfs selinux: sparse fix: declare selinux_disable() in security.h selinux: sparse fix: move selinux_complete_init selinux: sparse fix: make selinux_secmark_refcount static SELinux: Fix RCU deref check warning in sel_netport_insert() Manually fix up a semantic mis-merge wrt security_netlink_recv(): - the interface was removed in commit fd7784615248 ("security: remove the security_netlink_recv hook as it is equivalent to capable()") - a new user of it appeared in commit a38f7907b926 ("crypto: Add userspace configuration API") causing no automatic merge conflict, but Eric Paris pointed out the issue.
2012-01-12net: decrement memcg jump label when limit, not usage, is changedGlauber Costa
The logic of the current code is that whenever we destroy a cgroup that had its limit set (set meaning different than maximum), we should decrement the jump_label counter. Otherwise we assume it was never incremented. But what the code actually does is test for RES_USAGE instead of RES_LIMIT. Usage being different than maximum is likely to be true most of the time. The effect of this is that the key must become negative, and since the jump_label test says: !!atomic_read(&key->enabled); we'll have jump_labels still on when no one else is using this functionality. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12net: reintroduce missing rcu_assign_pointer() callsEric Dumazet
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-11inet_diag: Rename inet_diag_req_compat into inet_diag_reqPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-11inet_diag: Rename inet_diag_req into inet_diag_req_v2Pavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: igmp: Avoid zero delay when receiving odd mixture of IGMP queries netdev: make net_device_ops const bcm63xx: make ethtool_ops const usbnet: make ethtool_ops const net: Fix build with INET disabled. net: introduce netif_addr_lock_nested() and call if when appropriate net: correct lock name in dev_[uc/mc]_sync documentations. net: sk_update_clone is only used in net/core/sock.c 8139cp: fix missing napi_gro_flush. pktgen: set correct max and min in pktgen_setup_inject() smsc911x: Unconditionally include linux/smscphy.h in smsc911x.h asix: fix infinite loop in rx_fixup() net: Default UDP and UNIX diag to 'n'. r6040: fix typo in use of MCR0 register bits net: fix sock_clone reference mismatch with tcp memcontrol
2012-01-09igmp: Avoid zero delay when receiving odd mixture of IGMP queriesBen Hutchings
Commit 5b7c84066733c5dfb0e4016d939757b38de189e4 ('ipv4: correct IGMP behavior on v3 query during v2-compatibility mode') added yet another case for query parsing, which can result in max_delay = 0. Substitute a value of 1, as in the usual v3 case. Reported-by: Simon McVittie <smcv@debian.org> References: http://bugs.debian.org/654876 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-07net: Default UDP and UNIX diag to 'n'.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05security: remove the security_netlink_recv hook as it is equivalent to capable()Eric Paris
Once upon a time netlink was not sync and we had to get the effective capabilities from the skb that was being received. Today we instead get the capabilities from the current task. This has rendered the entire purpose of the hook moot as it is now functionally equivalent to the capable() call. Signed-off-by: Eric Paris <eparis@redhat.com>
2011-12-30inet_diag: Add the SKMEMINFO extensionPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28ipv6: Use universal hash for NDISC.David S. Miller
In order to perform a proper universal hash on a vector of integers, we have to use different universal hashes on each vector element. Which means we need 4 different hash randoms for ipv6. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-27netfilter: provide config option to disable ancient procfs partsJan Engelhardt
Using /proc/net/nf_conntrack has been deprecated in favour of the conntrack(8) tool. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-27netfilter: xtables: move ipt_ecn to xt_ecnJan Engelhardt
Prepare the ECN match for augmentation by an IPv6 counterpart. Since no symbol dependencies to ipv6.ko are added, having a single ecn match module is the more so welcome. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-25Merge branch 'nf-next' of git://1984.lsi.us.es/net-nextDavid S. Miller
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23netfilter: ctnetlink: remove dead NAT codePatrick McHardy
The NAT range to nlattr conversation callbacks and helpers are entirely dead code and are also useless since there are no NAT ranges in conntrack context, they are only used for initially selecting a tuple. The final NAT information is contained in the selected tuples of the conntrack entry. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: nf_nat: remove obsolete check in nf_nat_mangle_udp_packet()Patrick McHardy
The packet size check originates from a time when UDP helpers could accidentally mangle incorrect packets (NEWNAT) and is unnecessary nowadays since the conntrack helpers invoke the NAT helpers for the proper packet directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: nf_nat: remove obsolete code from nf_nat_icmp_reply_translation()Patrick McHardy
The inner tuple that is extracted from the packet is unused. The code also doesn't have any useful side-effects like verifying the packet does contain enough data to extract the inner tuple since conntrack already does the same, so remove it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: nat: remove module reference counting from NAT protocolsPatrick McHardy
The only remaining user of NAT protocol module reference counting is NAT ctnetlink support. Since this is a fairly short sequence of code, convert over to use RCU and remove module reference counting. Module unregistration is already protected by RCU using synchronize_rcu(), so no further changes are necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: nf_nat: add missing nla_policy entry for CTA_NAT_PROTO attributePatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: nf_nat: use hash random for bysource hashPatrick McHardy
Use nf_conntrack_hash_rnd in NAT bysource hash to avoid hash chain attacks. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: nf_nat: export NAT definitions to userspacePatrick McHardy
Export the NAT definitions to userspace. So far userspace (specifically, iptables) has been copying the headers files from include/net. Also rename some structures and definitions in preparation for IPv6 NAT. Since these have never been officially exported, this doesn't affect existing userspace code. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-22net: introduce DST_NOPEER dst flagEric Dumazet
Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22ipv4: using prefetch requires including prefetch.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-21ipv4: reintroduce route cache garbage collectorEric Dumazet
Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer) removed IP route cache garbage collector a bit too soon, as this gc was responsible for expired routes cleanup, releasing their neighbour reference. As pointed out by Robert Gladewitz, recent kernels can fill and exhaust their neighbour cache. Reintroduce the garbage collection, since we'll have to wait our neighbour lookups become refcount-less to not depend on this stuff. Reported-by: Robert Gladewitz <gladewitz@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21tcp: Replace constants with #define macrosVijay Subramanian
to record the state of SACK/FACK and DSACK for better readability and maintenance. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20net: have ipconfig not wait if no dev is availableGerlando Falauto
previous commit 3fb72f1e6e6165c5f495e8dc11c5bbd14c73385c makes IP-Config wait for carrier on at least one network device. Before waiting (predefined value 120s), check that at least one device was successfully brought up. Otherwise (e.g. buggy bootloader which does not set the MAC address) there is no point in waiting for carrier. Cc: Micha Nelissen <micha@neli.hopto.org> Cc: Holger Brunck <holger.brunck@keymile.com> Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19module_param: make bool parameters really bool (net & drivers/net)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false). Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19net: fix assignment of 0/1 to bool variables.Rusty Russell
DaveM said: Please, this kind of stuff rots forever and not using bool properly drives me crazy. Joe Perches <joe@perches.com> gave me the spatch script: @@ bool b; @@ -b = 0 +b = false @@ bool b; @@ -b = 1 +b = true I merely installed coccinelle, read the documentation and took credit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Generalize requests cookies managementsPavel Emelyanov
The sk address is used as a cookie between dump/get_exact calls. It will be required for unix socket sdumping, so move it from inet_diag to sock_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Fix module netlink aliasesPavel Emelyanov
I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fsl_pq_mdio.c net/batman-adv/translation-table.c net/ipv6/route.c
2011-12-15tcp_memcontrol: fix reversed if conditionDan Carpenter
We should only dereference the pointer if it's valid, not the other way round. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14net: ping: remove some sparse errorsEric Dumazet
net/ipv4/sysctl_net_ipv4.c:78:6: warning: symbol 'inet_get_ping_group_range_table' was not declared. Should it be static? net/ipv4/sysctl_net_ipv4.c:119:31: warning: incorrect type in argument 2 (different signedness) net/ipv4/sysctl_net_ipv4.c:119:31: expected int *range net/ipv4/sysctl_net_ipv4.c:119:31: got unsigned int *<noident> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12Display maximum tcp memory allocation in kmem cgroupGlauber Costa
This patch introduces kmem.tcp.max_usage_in_bytes file, living in the kmem_cgroup filesystem. The root cgroup will display a value equal to RESOURCE_MAX. This is to avoid introducing any locking schemes in the network paths when cgroups are not being actively used. All others, will see the maximum memory ever used by this cgroup. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12Display current tcp failcnt in kmem cgroupGlauber Costa
This patch introduces kmem.tcp.failcnt file, living in the kmem_cgroup filesystem. Following the pattern in the other memcg resources, this files keeps a counter of how many times allocation failed due to limits being hit in this cgroup. The root cgroup will always show a failcnt of 0. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12Display current tcp memory allocation in kmem cgroupGlauber Costa
This patch introduces kmem.tcp.usage_in_bytes file, living in the kmem_cgroup filesystem. It is a simple read-only file that displays the amount of kernel memory currently consumed by the cgroup. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12tcp buffer limitation: per-cgroup limitGlauber Costa
This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to effectively control the amount of kernel memory pinned by a cgroup. This value is ignored in the root cgroup, and in all others, caps the value specified by the admin in the net namespaces' view of tcp_sysctl_mem. If namespaces are being used, the admin is allowed to set a value bigger than cgroup's maximum, the same way it is allowed to set pretty much unlimited values in a real box. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12per-netns ipv4 sysctl_tcp_memGlauber Costa
This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12tcp memory pressure controlsGlauber Costa
This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12foundations of per-cgroup memory pressure controlling.Glauber Costa
This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>