aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-01-13sched, net: Clean up preempt_enable_no_resched() abusePeter Zijlstra
The only valid use of preempt_enable_no_resched() is if the very next line is schedule() or if we know preemption cannot actually be enabled by that statement due to known more preempt_count 'refs'. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-10net: core: explicitly select a txq before doing l2 forwardingJason Wang
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== For the mac80211 bits, Johannes says: "I have a fix from Javier for mac80211_hwsim when used with wmediumd userspace, and a fix from Felix for buffering in AP mode." For the NFC bits, Samuel says: "This pull request only contains one fix for a regression introduced with commit e29a9e2ae165620d. Without this fix, we can not establish a p2p link in target mode. Only initiator mode works." For the iwlwifi bits, Emmanuel says: "It only includes new device IDs so it's not vital. If you have a pull request to net.git anyway, I'd happy to have this in." ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIMEHannes Frederic Sowa
In the past the IFA_PERMANENT flag indicated, that the valid and preferred lifetime where ignored. Since change fad8da3e085ddf ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") we honour at least the preferred lifetime on those addresses. As such the valid lifetime gets recalculated and updated to 0. If loopback address is added manually this problem does not occur. Also if NetworkManager manages IPv6, those addresses will get added via inet6_rtm_newaddr and thus will have a correct lifetime, too. Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Reported-by: Damien Wyart <damien.wyart@gmail.com> Fixes: fad8da3e085ddf ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") Cc: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2014-01-08Merge tag 'nfc-fixes-3.13-1' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes Samuel Ortiz <sameo@linux.intel.com> says: "This is the first NFC fixes pull request for 3.13. It only contains one fix for a regression introduced with commit e29a9e2ae165620d. Without this fix, we can not establish a p2p link in target mode. Only initiator mode works." Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-07Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains two patches: * fix the IRC NAT helper which was broken when adding (incomplete) IPv6 support, from Daniel Borkmann. * Refine the previous bugtrap that Jesper added to catch problems for the usage of the sequence adjustment extension in IPVs in Dec 16th, it may spam messages in case of finding a real bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07tipc: correctly unlink packets from deferred packet queueErik Hugne
When we pull a received packet from a link's 'deferred packets' queue for processing, its 'next' pointer is not cleared, and still refers to the next packet in that queue, if any. This is incorrect, but caused no harm before commit 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc: message reassembly using fragment chain") was introduced. After that commit, it may sometimes lead to the following oops: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: tipc CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W 3.13.0-rc2+ #6 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880017af4880 ti: ffff880017aee000 task.ti: ffff880017aee000 RIP: 0010:[<ffffffff81710694>] [<ffffffff81710694>] skb_try_coalesce+0x44/0x3d0 RSP: 0018:ffff880016603a78 EFLAGS: 00010212 RAX: 6b6b6b6bd6d6d6d6 RBX: ffff880013106ac0 RCX: ffff880016603ad0 RDX: ffff880016603ad7 RSI: ffff88001223ed00 RDI: ffff880013106ac0 RBP: ffff880016603ab8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001223ed00 R13: ffff880016603ad0 R14: 000000000000058c R15: ffff880012297650 FS: 0000000000000000(0000) GS:ffff880016600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000805b000 CR3: 0000000011f5d000 CR4: 00000000000006e0 Stack: ffff880016603a88 ffffffff810a38ed ffff880016603aa8 ffff88001223ed00 0000000000000001 ffff880012297648 ffff880016603b68 ffff880012297650 ffff880016603b08 ffffffffa0006c51 ffff880016603b08 00ffffffa00005fc Call Trace: <IRQ> [<ffffffff810a38ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0006c51>] tipc_link_recv_fragment+0xd1/0x1b0 [tipc] [<ffffffffa0007214>] tipc_recv_msg+0x4e4/0x920 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffffa000177c>] tipc_l2_rcv_msg+0xcc/0x250 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffff8171e65b>] __netif_receive_skb_core+0x80b/0xd00 [<ffffffff8171df94>] ? __netif_receive_skb_core+0x144/0xd00 [<ffffffff8171eb76>] __netif_receive_skb+0x26/0x70 [<ffffffff8171ed6d>] netif_receive_skb+0x2d/0x200 [<ffffffff8171fe70>] napi_gro_receive+0xb0/0x130 [<ffffffff815647c2>] e1000_clean_rx_irq+0x2c2/0x530 [<ffffffff81565986>] e1000_clean+0x266/0x9c0 [<ffffffff81985f7b>] ? notifier_call_chain+0x2b/0x160 [<ffffffff8171f971>] net_rx_action+0x141/0x310 [<ffffffff81051c1b>] __do_softirq+0xeb/0x480 [<ffffffff819817bb>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810b8c42>] ? handle_fasteoi_irq+0x72/0x100 [<ffffffff81052346>] irq_exit+0x96/0xc0 [<ffffffff8198cbc3>] do_IRQ+0x63/0xe0 [<ffffffff81981def>] common_interrupt+0x6f/0x6f <EOI> This happens when the last fragment of a message has passed through the the receiving link's 'deferred packets' queue, and at least one other packet was added to that queue while it was there. After the fragment chain with the complete message has been successfully delivered to the receiving socket, it is released. Since 'next' pointer of the last fragment in the released chain now is non-NULL, we get the crash shown above. We fix this by clearing the 'next' pointer of all received packets, including those being pulled from the 'deferred' queue, before they undergo any further processing. Fixes: 40ba3cdf542a4 ("tipc: message reassembly using fragment chain") Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.cLi RongQing
initialise pcpu_tstats.syncp to kill the calltrace [ 11.973950] Call Trace: [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 Before 469bdcefdc ("ipv6: fix the use of pcpu_tstats in ip6_vti.c"), the pcpu_tstats.syncp is not used to pretect the 64bit elements of pcpu_tstats, so not appear this calltrace. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06bridge: use spin_lock_bh() in br_multicast_set_hash_maxCurt Brune
br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune <curt@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06ipv6: don't install anycast address for /128 addresses on routersHannes Frederic Sowa
It does not make sense to create an anycast address for an /128-prefix. Suppress it. As 32019e651c6fce ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail <fx.lebail@yahoo.com> Cc: Thomas Haller <thaller@redhat.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2014-01-06netfilter: only warn once on wrong seqadj usageJesper Dangaard Brouer
Avoid potentially spamming the kernel log with WARN splash messages when catching wrong usage of seqadj, by simply using WARN_ONCE. This is a followup to commit db12cf274353 (netfilter: WARN about wrong usage of sequence number adjustments) Suggested-by: Flavio Leitner <fbl@redhat.com> Suggested-by: Daniel Borkmann <dborkman@redhat.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helperDaniel Borkmann
Commit 5901b6be885e attempted to introduce IPv6 support into IRC NAT helper. By doing so, the following code seemed to be removed by accident: ip = ntohl(exp->master->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip); sprintf(buffer, "%u %u", ip, port); pr_debug("nf_nat_irc: inserting '%s' == %pI4, port %u\n", buffer, &ip, port); This leads to the fact that buffer[] was left uninitialized and contained some stack value. When we call nf_nat_mangle_tcp_packet(), we call strlen(buffer) on excatly this uninitialized buffer. If we are unlucky and the skb has enough tailroom, we overwrite resp. leak contents with values that sit on our stack into the packet and send that out to the receiver. Since the rather informal DCC spec [1] does not seem to specify IPv6 support right now, we log such occurences so that admins can act accordingly, and drop the packet. I've looked into XChat source, and IPv6 is not supported there: addresses are in u32 and print via %u format string. Therefore, restore old behaviour as in IPv4, use snprintf(). The IRC helper does not support IPv6 by now. By this, we can safely use strlen(buffer) in nf_nat_mangle_tcp_packet() and prevent a buffer overflow. Also simplify some code as we now have ct variable anyway. [1] http://www.irchelp.org/irchelp/rfc/ctcpspec.html Fixes: 5901b6be885e ("netfilter: nf_nat: support IPv6 in IRC NAT helper") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Harald Welte <laforge@gnumonks.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-05net: 6lowpan: fix lowpan_header_create non-compression memcpy callDaniel Borkmann
In function lowpan_header_create(), we invoke the following code construct: struct ipv6hdr *hdr; ... hdr = ipv6_hdr(skb); ... if (...) memcpy(hc06_ptr + 1, &hdr->flow_lbl[1], 2); else memcpy(hc06_ptr, &hdr, 4); Where the else path of the condition, that is, non-compression path, calls memcpy() with a pointer to struct ipv6hdr *hdr as source, thus two levels of indirection. This cannot be correct, and likely only one level of pointer was intended as source buffer for memcpy() here. Fixes: 44331fe2aa0d ("IEEE802.15.4: 6LoWPAN basic support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04NFC: Fix target mode p2p link establishmentArron Wang
With commit e29a9e2ae165620d, we set the active_target pointer from nfc_dep_link_is_up() in order to support the case where the target detection and the DEP link setting are done atomically by the driver. That can only happen in initiator mode, so we need to check for that otherwise we fail to bring a p2p link in target mode. Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-02netpoll: Fix missing TXQ unlock and and OOPS.David S. Miller
The VLAN tag handling code in netpoll_send_skb_on_dev() has two problems. 1) It exits without unlocking the TXQ. 2) It then tries to queue a NULL skb to npinfo->txq. Reported-by: Ahmed Tamrawi <atamrawi@iastate.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6: fix the use of pcpu_tstats in ip6_vti.cLi RongQing
when read/write the 64bit data, the correct lock should be hold. and we can use the generic vti6_get_stats to return stats, and not define a new one in ip6_vti.c Fixes: 87b6d218f3adb ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6: fix the use of pcpu_tstats in ip6_tunnelLi RongQing
when read/write the 64bit data, the correct lock should be hold. Fixes: 87b6d218f3adb ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv6 addrconf: fix preferred lifetime state-changing behavior while ↵Yasushi Asano
valid_lft is infinity Fixed a problem with setting the lifetime of an IPv6 address. When setting preferred_lft to a value not zero or infinity, while valid_lft is infinity(0xffffffff) preferred lifetime is set to forever and does not update. Therefore preferred lifetime never becomes deprecated. valid lifetime and preferred lifetime should be set independently, even if valid lifetime is infinity, preferred lifetime must expire correctly (meaning it must eventually become deprecated) Signed-off-by: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02net: llc: fix use after free in llc_ui_recvmsgDaniel Borkmann
While commit 30a584d944fb fixes datagram interface in LLC, a use after free bug has been introduced for SOCK_STREAM sockets that do not make use of MSG_PEEK. The flow is as follow ... if (!(flags & MSG_PEEK)) { ... sk_eat_skb(sk, skb, false); ... } ... if (used + offset < skb->len) continue; ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache original length and work on skb_len to check partial reads. Fixes: 30a584d944fb ("[LLX]: SOCK_DGRAM interface fixes") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NICWei-Chun Chao
VM to VM GSO traffic is broken if it goes through VXLAN or GRE tunnel and the physical NIC on the host supports hardware VXLAN/GRE GSO offload (e.g. bnx2x and next-gen mlx4). Two issues - (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header integrity check fails in udp4_ufo_fragment if inner protocol is TCP. Also gso_segs is calculated incorrectly using skb->len that includes tunnel header. Fix: robust check should only be applied to the inner packet. (VXLAN & GRE) Once GSO header integrity check passes, NULL segs is returned and the original skb is sent to hardware. However the tunnel header is already pulled. Fix: tunnel header needs to be restored so that hardware can perform GSO properly on the original packet. Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02sctp: Remove outqueue empty stateVlad Yasevich
The SCTP outqueue structure maintains a data chunks that are pending transmission, the list of chunks that are pending a retransmission and a length of data in flight. It also tries to keep the emtpy state so that it can performe shutdown sequence or notify user. The problem is that the empy state is inconsistently tracked. It is possible to completely drain the queue without sending anything when using PR-SCTP. In this case, the empty state will not be correctly state as report by Jamal Hadi Salim <jhs@mojatatu.com>. This can cause an association to be perminantly stuck in the SHUTDOWN_PENDING state. Additionally, SCTP is incredibly inefficient when setting the empty state. Even though all the data is availaible in the outqueue structure, we ignore it and walk a list of trasnports. In the end, we can completely remove the extra empty state and figure out if the queue is empty by looking at 3 things: length of pending data, length of in-flight data, and exisiting of retransmit data. All of these are already in the strucutre. Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01ipv6: fix the use of pcpu_tstats in sitLi RongQing
when read/write the 64bit data, the correct lock should be hold. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31vlan: Fix header ops passthru when doing TX VLAN offload.David S. Miller
When the vlan code detects that the real device can do TX VLAN offloads in hardware, it tries to arrange for the real device's header_ops to be invoked directly. But it does so illegally, by simply hooking the real device's header_ops up to the VLAN device. This doesn't work because we will end up invoking a set of header_ops routines which expect a device type which matches the real device, but will see a VLAN device instead. Fix this by providing a pass-thru set of header_ops which will arrange to pass the proper real device instead. To facilitate this add a dev_rebuild_header(). There are implementations which provide a ->cache and ->create but not a ->rebuild (f.e. PLIP). So we need a helper function just like dev_hard_header() to avoid crashes. Use this helper in the one existing place where the header_ops->rebuild was being invoked, the neighbour code. With lots of help from Florian Westphal. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29net: rose: restore old recvmsg behaviorFlorian Westphal
recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen. After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic), we now always take the else branch due to namelen being initialized to 0. Digging in netdev-vger-cvs git repo shows that msg_namelen was initialized with a fixed-size since at least 1995, so the else branch was never taken. Compile tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29tipc: fix deadlock during socket releaseYing Xue
A deadlock might occur if name table is withdrawn in socket release routine, and while packets are still being received from bearer. CPU0 CPU1 T0: recv_msg() release() T1: tipc_recv_msg() tipc_withdraw() T2: [grab node lock] [grab port lock] T3: tipc_link_wakeup_ports() tipc_nametbl_withdraw() T4: [grab port lock]* named_cluster_distribute() T5: wakeupdispatch() tipc_link_send() T6: [grab node lock]* The opposite order of holding port lock and node lock on above two different paths may result in a deadlock. If socket lock instead of port lock is used to protect port instance in tipc_withdraw(), the reverse order of holding port lock and node lock will be eliminated, as a result, the deadlock is killed as well. Reported-by: Lars Everbrand <lars.everbrand@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included changes: - reset netfilter-bridge state when removing the batman-adv header from an incoming packet. This prevents netfilter bridge from being fooled when the same packet enters a bridge twice (or more): the first time within the batman-adv header and the second time without. - adjust the packet layout to prevent any architecture from adding padding bytes. All the structs sent over the wire now have size multiple of 4bytes (unless pack(2) is used). - fix access to the inner vlan_eth header when reading the VID in the rx path. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net This patchset contains four nf_tables fixes, one IPVS fix due to missing updates in the interaction with the new sedadj conntrack extension that was added to support the netfilter synproxy code, and a couple of one-liners to fix netnamespace netfilter issues. More specifically, they are: * Fix ipv6_find_hdr() call without offset being explicitly initialized in nft_exthdr, as required by that function, from Daniel Borkmann. * Fix oops in nfnetlink_log when using netns and unloading the kernel module, from Gao feng. * Fix BUG_ON in nf_ct_timestamp extension after netns is destroyed, from Helmut Schaa. * Fix crash in IPVS due to missing sequence adjustment extension being allocated in the conntrack, from Jesper Dangaard Brouer. * Add bugtrap to spot a warning in case you deference sequence adjustment conntrack area when not available, this should help to catch similar invalid dereferences in the Netfilter tree, also from Jesper. * Fix incomplete dumping of sets in nf_tables when retrieving by family, from me. * Fix oops when updating the table state (dormant <-> active) and having user (not base ) chains, from me. * Fix wrong validation in set element data that results in returning -EINVAL when using the nf_tables dictionary feature with mappings, also from me. We don't usually have this amount of fixes by this time (as we're already in -rc5 of the development cycle), although half of them are related to nf_tables which is a relatively new thing, and I also believe that holidays have also delayed the flight of bugfixes to mainstream a bit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()Pablo Neira Ayuso
This patch fixes dictionary mappings, eg. add rule ip filter input meta dnat set tcp dport map { 22 => 1.1.1.1, 23 => 2.2.2.2 } The kernel was returning -EINVAL in nft_validate_data_load() since the type of the set element data that is passed was the real userspace datatype instead of NFT_DATA_VALUE. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28batman-adv: fix vlan header accessAntonio Quartulli
When batadv_get_vid() is invoked in interface_rx() the batman-adv header has already been removed, therefore the header_len argument has to be 0. Introduced by c018ad3de61a1dc4194879a53e5559e094aa7b1a ("batman-adv: add the VLAN ID attribute to the TT entry") Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-12-28batman-adv: clean nf state when removing protocol headerAntonio Quartulli
If an interface enslaved into batman-adv is a bridge (or a virtual interface built on top of a bridge) the nf_bridge member of the skbs reaching the soft-interface is filled with the state about "netfilter bridge" operations. Then, if one of such skbs is locally delivered, the nf_bridge member should be cleaned up to avoid that the old state could mess up with other "netfilter bridge" operations when entering a second bridge. This is needed because batman-adv is an encapsulation protocol. However at the moment skb->nf_bridge is not released at all leading to bogus "netfilter bridge" behaviours. Fix this by cleaning the netfilter state of the skb before it gets delivered to the upper layer in interface_rx(). Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-12-28batman-adv: fix alignment for batadv_tvlv_tt_changeAntonio Quartulli
Make struct batadv_tvlv_tt_change a multiple 4 bytes long to avoid padding on any architecture. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-12-28batman-adv: fix size of batadv_bla_claim_dstSimon Wunderlich
Since this is a mac address and always 48 bit, and we can assume that it is always aligned to 2-byte boundaries, add a pack(2) pragma. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-12-28batman-adv: fix size of batadv_icmp_headerAntonio Quartulli
struct batadv_icmp_header currently has a size of 17, which will be padded to 20 on some architectures. Fix this by unrolling the header into the parent structures. Moreover keep the ICMP parsing functions as generic as they are now by using a stub icmp_header struct during packet parsing. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-12-28batman-adv: fix header alignment by unrolling batadv_headerSimon Wunderlich
The size of the batadv_header of 3 is problematic on some architectures which automatically pad all structures to a 32 bit boundary. To not lose performance by packing this struct, better embed it into the various host structures. Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-12-28batman-adv: fix alignment for batadv_coded_packetSimon Wunderlich
The compiler may decide to pad the structure, and then it does not have the expected size of 46 byte. Fix this by moving it in the pragma pack(2) part of the code. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-12-28netfilter: nf_tables: fix oops when updating table with user chainsPablo Neira Ayuso
This patch fixes a crash while trying to deactivate a table that contains user chains. You can reproduce it via: % nft add table table1 % nft add chain table1 chain1 % nft-table-upd ip table1 dormant [ 253.021026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [ 253.021114] IP: [<ffffffff8134cebd>] nf_register_hook+0x35/0x6f [ 253.021167] PGD 30fa5067 PUD 30fa2067 PMD 0 [ 253.021208] Oops: 0000 [#1] SMP [...] [ 253.023305] Call Trace: [ 253.023331] [<ffffffffa0885020>] nf_tables_newtable+0x11c/0x258 [nf_tables] [ 253.023385] [<ffffffffa0878592>] nfnetlink_rcv_msg+0x1f4/0x226 [nfnetlink] [ 253.023438] [<ffffffffa0878418>] ? nfnetlink_rcv_msg+0x7a/0x226 [nfnetlink] [ 253.023491] [<ffffffffa087839e>] ? nfnetlink_bind+0x45/0x45 [nfnetlink] [ 253.023542] [<ffffffff8134b47e>] netlink_rcv_skb+0x3c/0x88 [ 253.023586] [<ffffffffa0878973>] nfnetlink_rcv+0x3af/0x3e4 [nfnetlink] [ 253.023638] [<ffffffff813fb0d4>] ? _raw_read_unlock+0x22/0x34 [ 253.023683] [<ffffffff8134af17>] netlink_unicast+0xe2/0x161 [ 253.023727] [<ffffffff8134b29a>] netlink_sendmsg+0x304/0x332 [ 253.023773] [<ffffffff8130d250>] __sock_sendmsg_nosec+0x25/0x27 [ 253.023820] [<ffffffff8130fb93>] sock_sendmsg+0x5a/0x7b [ 253.023861] [<ffffffff8130d5d5>] ? copy_from_user+0x2a/0x2c [ 253.023905] [<ffffffff8131066f>] ? move_addr_to_kernel+0x35/0x60 [ 253.023952] [<ffffffff813107b3>] SYSC_sendto+0x119/0x15c [ 253.023995] [<ffffffff81401107>] ? sysret_check+0x1b/0x56 [ 253.024039] [<ffffffff8108dc30>] ? trace_hardirqs_on_caller+0x140/0x1db [ 253.024090] [<ffffffff8120164e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 253.024141] [<ffffffff81310caf>] SyS_sendto+0x9/0xb [ 253.026219] [<ffffffff814010e2>] system_call_fastpath+0x16/0x1b Reported-by: Alex Wei <alex.kern.mentor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28netfilter: nf_tables: fix dumping with large number of setsPablo Neira Ayuso
If not table name is specified, the dumping of the existing sets may be incomplete with a sufficiently large number of sets and tables. This patch fixes missing reset of the cursors after finding the location of the last object that has been included in the previous multi-part message. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-27ipv6: release dst properly in ipip6_tunnel_xmitLi RongQing
if a dst is not attached to anywhere, it should be released before exit ipip6_tunnel_xmit, otherwise cause dst memory leakage. Fixes: 61c1db7fae21 ("ipv6: sit: add GSO/TSO support") Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-27net_sched: act: Dont increment refcnt on replaceJamal Hadi Salim
This is a bug fix. The existing code tries to kill many birds with one stone: Handling binding of actions to filters, new actions and replacing of action attributes. A simple test case to illustrate: XXXX moja@fe1:~$ sudo tc actions add action drop index 12 moja@fe1:~$ actions get action gact index 12 action order 1: gact action drop random type none pass val 0 index 12 ref 1 bind 0 moja@fe1:~$ sudo tc actions replace action ok index 12 moja@fe1:~$ actions get action gact index 12 action order 1: gact action drop random type none pass val 0 index 12 ref 2 bind 0 XXXX The above shows the refcounf being wrongly incremented on replace. There are more complex scenarios with binding of actions to filters that i am leaving out that didnt work as well... Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-27rds: prevent dereference of a NULL deviceSasha Levin
Binding might result in a NULL device, which is dereferenced causing this BUG: [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097 4 [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0 [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1317.264179] Dumping ftrace buffer: [ 1317.264774] (ftrace buffer empty) [ 1317.265220] Modules linked in: [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4- next-20131218-sasha-00013-g2cebb9b-dirty #4159 [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000 [ 1317.268399] RIP: 0010:[<ffffffff84225f52>] [<ffffffff84225f52>] rds_ib_laddr_check+ 0x82/0x110 [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246 [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000 [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286 [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000 [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031 [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000 0000 [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0 [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 [ 1317.270230] Stack: [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000 [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160 [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280 [ 1317.270230] Call Trace: [ 1317.270230] [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0 [ 1317.270230] [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0 [ 1317.270230] [<ffffffff8421c9c3>] rds_bind+0x73/0xf0 [ 1317.270230] [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0 [ 1317.270230] [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0 [ 1317.270230] [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10 [ 1317.270230] [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290 [ 1317.270230] [<ffffffff83e4cece>] SyS_bind+0xe/0x10 [ 1317.270230] [<ffffffff843a6ad0>] tracesys+0xdd/0xe2 [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00 89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7 4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02 [ 1317.270230] RIP [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.270230] RSP <ffff8803cd31bdf8> [ 1317.270230] CR2: 0000000000000974 Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-27ipvs: correct usage/allocation of seqadj ext in ipvsJesper Dangaard Brouer
The IPVS FTP helper ip_vs_ftp could trigger an OOPS in nf_ct_seqadj_set, after commit 41d73ec053d2 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT). This is because, the seqadj ext is now allocated dynamically, and the IPVS code didn't handle this situation. Fix this in the IPVS nfct code by invoking the alloc function nfct_seqadj_ext_add(). Fixes: 41d73ec053d2 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT) Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-12-27netfilter: WARN about wrong usage of sequence number adjustmentsJesper Dangaard Brouer
Since commit 41d73ec053d2 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT), the sequence number extension is dynamically allocated. Instead of dying, give a WARN splash, in case of wrong usage of the seqadj code, e.g. when forgetting to allocate via nfct_seqadj_ext_add(). Wrong usage have been seen in the IPVS code path. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-12-22ipv4: consistent reporting of pmtu data in case of corkingHannes Frederic Sowa
We report different pmtu values back on the first write and on further writes on an corked socket. Also don't include the dst.header_len (respectively exthdrlen) as this should already be dealt with by the interface mtu of the outgoing (virtual) interface and policy of that interface should dictate if fragmentation should happen. Instead reduce the pmtu data by IP options as we do for IPv6. Make the same changes for ip_append_data, where we did not care about options or dst.header_len at all. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-20Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-12-20netfilter: nf_ct_timestamp: Fix BUG_ON after netns deletionHelmut Schaa
When having nf_conntrack_timestamp enabled deleting a netns can lead to the following BUG being triggered: [63836.660000] Kernel bug detected[#1]: [63836.660000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.18 #14 [63836.660000] task: 802d9420 ti: 802d2000 task.ti: 802d2000 [63836.660000] $ 0 : 00000000 00000000 00000000 00000000 [63836.660000] $ 4 : 00000001 00000004 00000020 00000020 [63836.660000] $ 8 : 00000000 80064910 00000000 00000000 [63836.660000] $12 : 0bff0002 00000001 00000000 0a0a0abe [63836.660000] $16 : 802e70a0 85f29d80 00000000 00000004 [63836.660000] $20 : 85fb62a0 00000002 802d3bc0 85fb62a0 [63836.660000] $24 : 00000000 87138110 [63836.660000] $28 : 802d2000 802d3b40 00000014 871327cc [63836.660000] Hi : 000005ff [63836.660000] Lo : f2edd000 [63836.660000] epc : 87138794 __nf_ct_ext_add_length+0xe8/0x1ec [nf_conntrack] [63836.660000] Not tainted [63836.660000] ra : 871327cc nf_conntrack_in+0x31c/0x7b8 [nf_conntrack] [63836.660000] Status: 1100d403 KERNEL EXL IE [63836.660000] Cause : 00800034 [63836.660000] PrId : 0001974c (MIPS 74Kc) [63836.660000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_quota xt_policy xt_pkttype xt_owner xt_nat xt_multiport xt_mark xh [63836.660000] Process swapper (pid: 0, threadinfo=802d2000, task=802d9420, tls=00000000) [63836.660000] Stack : 802e70a0 871323d4 00000005 87080234 802e70a0 86d2a840 00000000 00000000 [63836.660000] Call Trace: [63836.660000] [<87138794>] __nf_ct_ext_add_length+0xe8/0x1ec [nf_conntrack] [63836.660000] [<871327cc>] nf_conntrack_in+0x31c/0x7b8 [nf_conntrack] [63836.660000] [<801ff63c>] nf_iterate+0x90/0xec [63836.660000] [<801ff730>] nf_hook_slow+0x98/0x164 [63836.660000] [<80205968>] ip_rcv+0x3e8/0x40c [63836.660000] [<801d9754>] __netif_receive_skb_core+0x624/0x6a4 [63836.660000] [<801da124>] process_backlog+0xa4/0x16c [63836.660000] [<801d9bb4>] net_rx_action+0x10c/0x1e0 [63836.660000] [<8007c5a4>] __do_softirq+0xd0/0x1bc [63836.660000] [<8007c730>] do_softirq+0x48/0x68 [63836.660000] [<8007c964>] irq_exit+0x54/0x70 [63836.660000] [<80060830>] ret_from_irq+0x0/0x4 [63836.660000] [<8006a9f8>] r4k_wait_irqoff+0x18/0x1c [63836.660000] [<8009cfb8>] cpu_startup_entry+0xa4/0x104 [63836.660000] [<802eb918>] start_kernel+0x394/0x3ac [63836.660000] [63836.660000] Code: 00821021 8c420000 2c440001 <00040336> 90440011 92350010 90560010 2485ffff 02a5a821 [63837.040000] ---[ end trace ebf660c3ce3b55e7 ]--- [63837.050000] Kernel panic - not syncing: Fatal exception in interrupt [63837.050000] Rebooting in 3 seconds.. Fix this by not unregistering the conntrack extension in the per-netns cleanup code. This bug was introduced in (73f4001 netfilter: nf_ct_tstamp: move initialization out of pernet_operations). Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20netfilter: nft_exthdr: call ipv6_find_hdr() with explicitly initialized offsetDaniel Borkmann
In nft's nft_exthdr_eval() routine we process IPv6 extension header through invoking ipv6_find_hdr(), but we call it with an uninitialized offset variable that contains some stack value. In ipv6_find_hdr() we then test if the value of offset != 0 and call skb_header_pointer() on that offset in order to map struct ipv6hdr into it. Fix it up by initializing offset to 0 as it was probably intended to be. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-19dccp: catch failed request_module call in dccp_probe initWang Weidong
Check the return value of request_module during dccp_probe initialisation, bail out if that call fails. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to net, ixgbe and e1000e. David provides compiler fixes for e1000e. Don provides a fix for ixgbe to resolve a compile warning. John provides a fix to net where it is useful to be able to walk all upper devices when bringing a device online where the RTNL lock is held. In this case, it is safe to walk the all_adj_list because the RTNL lock is used to protect the write side as well. This patch adds a check to see if the RTNL lock is held before throwing a warning in netdev_all_upper_get_next_dev_rcu(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>