aboutsummaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)Author
2006-12-02[NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failuresPatrick McHardy
When peeking at the next packet in a child qdisc by calling dequeue/requeue, the upper qdisc qlen counter may get out of sync in case the requeue fails. The qdisc and the child qdisc both have their counter decremented, but since no packet is given to the upper qdisc it won't decrement its counter itself. requeue should not fail, so this is mostly for "correctness". Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET_SCHED]: Fix endless loops (part 4): HTBPatrick McHardy
Convert HTB to use qdisc_tree_decrease_len() and add a callback for deactivating a class when its child queue becomes empty. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET_SCHED]: Fix endless loops (part 3): HFSCPatrick McHardy
Convert HFSC to use qdisc_tree_decrease_len() and add a callback for deactivating a class when its child queue becomes empty. All queue purging goes through hfsc_purge_queue(), which is used in three cases: grafting, class creation (when a leaf class is turned into an intermediate class by attaching a new class) and class deletion. In all cases qdisc_tree_decrease_len() is needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscsPatrick McHardy
Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where necessary: - all graft operations - destruction of old child qdiscs in prio, red and tbf change operation - purging of queue in sfq change operation Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)Patrick McHardy
There are multiple problems related to qlen adjustment that can lead to an upper qdisc getting out of sync with the real number of packets queued, leading to endless dequeueing attempts by the upper layer code. All qdiscs must maintain an accurate q.qlen counter. There are basically two groups of operations affecting the qlen: operations that propagate down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the root qdisc and operations only affecting a subtree or single qdisc (change, graft, delete class). Since qlen changes during operations from the second group don't propagate to ancestor qdiscs, their qlen values become desynchronized. This patch adds a function to propagate qlen changes up the qdisc tree, optionally calling a callback function to perform qdisc-internal maintenance when the child qdisc becomes empty. The follow-up patches will convert all qdiscs to use this function where necessary. Noticed by Timo Steinbach <tsteinbach@astaro.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET_SCHED]: Set parent classid in default qdiscsPatrick McHardy
Set parent classids in default qdiscs to allow walking up the tree from outside the qdiscs. This is needed by the next patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->deletePatrick McHardy
qlen adjustment should happen immediately in ->delete and not in the class destroy function because the reference count will not hit zero in ->delete (sch_api holds a reference) but in ->put. Since the qdisc lock is released between deletion of the class and final destruction this creates an externally visible error in the qlen counter. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[SCHED]: Use kmemdup & kzalloc where appropriateArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02[NET]: net/sched annotations.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[PKT_SCHED]: Make sch_fifo.o available when CONFIG_NET_SCHED is not set.David Kimdon
Based on patch by Patrick McHardy. Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc without requiring CONFIG_NET_SCHED. The d80211 stack needs a generic fifo qdisc for WME. At present it uses net/d80211/fifo_qdisc.c which is functionally equivalent to sch_fifo.c. This patch will allow the d80211 stack to remove net/d80211/fifo_qdisc.c and use sch_fifo.c instead. Signed-off-by: David Kimdon <david.kimdon@devicescape.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Turn nfmark into generic markThomas Graf
nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-07[PKT_SCHED] sch_htb: Use hlist_del_init().Stephen Hemminger
Otherwise we can hit paths that (legally) do multiple deletes on the same node and OOPS with the HLIST poison values there instead of NULL. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-31[PATCH] skge, sky2, et all. gplv2 onlyStephen Hemminger
I don't want my code to downgraded to GPLv3 because of cut-n-pasted the comments. These files which I hold copyright on were started before it was clear what GPLv3 was going to be. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-10-22[PKT_SCHED] netem: Orphan SKB when adding to queue.David S. Miller
The networking emulator can queue SKBs for a very long time, so if you're using netem on the sender side for large bandwidth/delay product testing, the SKB socket send queue sizes become artificially larger. Correct this by calling skb_orphan() in netem_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-12[PKT_SCHED] sch_htb: use rb_first() cleanupAkinbou Mita
Use rb_first() to get first entry in rb tree. Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-04[NET_SCHED]: Remove old estimator implementationPatrick McHardy
Remove unused file, estimators live in net/core/gen_estimator.c now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-04[NET_SCHED]: Revert "HTB: fix incorrect use of RB_EMPTY_NODE"Ismail Donmez
With commit 10fd48f2376db52f08bf0420d2c4f580e39269e1 [1] , RB_EMPTY_NODE changed behaviour so it returns true when the node is empty as expected. Hence Patrick McHardy's fix for sched_htb.c should be reverted. Signed-off-by: Ismail Donmez <ismail@pardus.org.tr> ACKed-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[NET_SCHED]: Fix fallout from dev->qdisc RCU changePatrick McHardy
The move of qdisc destruction to a rcu callback broke locking in the entire qdisc layer by invalidating previously valid assumptions about the context in which changes to the qdisc tree occur. The two assumptions were: - since changes only happen in process context, read_lock doesn't need bottem half protection. Now invalid since destruction of inner qdiscs, classifiers, actions and estimators happens in the RCU callback unless they're manually deleted, resulting in dead-locks when read_lock in process context is interrupted by write_lock_bh in bottem half context. - since changes only happen under the RTNL, no additional locking is necessary for data not used during packet processing (f.e. u32_list). Again, since destruction now happens in the RCU callback, this assumption is not valid anymore, causing races while using this data, which can result in corruption or use-after-free. Instead of "fixing" this by disabling bottem halfs everywhere and adding new locks/refcounting, this patch makes these assumptions valid again by moving destruction back to process context. Since only the dev->qdisc pointer is protected by RCU, but ->enqueue and the qdisc tree are still protected by dev->qdisc_lock, destruction of the tree can be performed immediately and only the final free needs to happen in the rcu callback to make sure dev_queue_xmit doesn't access already freed memory. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODEPatrick McHardy
Fix incorrect use of RB_EMPTY_NODE in htb_safe_rb_erase, which makes it skip nodes within the rbtree instead of nodes not in the tree, resulting in crashes later on. The root cause for this seems to be the very counter-intuitive behaviour of the RB_EMPTY_NODE macro, which returns _false_ when the node is empty. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[PKT_SCHED] cls_basic: Use unsigned int when generating handleKim Nordlund
Prevents filters from being added if the first generated handle already exists. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: Thomas Graf <tgraf@suug.ch>
2006-09-22[PKT_SCHED] act_simple.c: make struct simp_hash_info staticAdrian Bunk
This patch makes the needlessly global struct simp_hash_info static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NET_SCHED]: Add mask support to fwmark classifierPatrick McHardy
Support masking the nfmark value before the search. The mask value is global for all filters contained in one instance. It can only be set when a new instance is created, all filters must specify the same mask. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NETFILTER]: x_tables: remove unused size argument to check/destroy functionsPatrick McHardy
The size is verified by x_tables and isn't needed by the modules anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NETFILTER]: x_tables: remove unused argument to target functionsPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[PKT_SCHED]: Kill pkt_act.h inlining.David S. Miller
This was simply making templates of functions and mostly causing a lot of code duplication in the classifier action modules. We solve this more cleanly by having a common "struct tcf_common" that hash worker functions contained once in act_api.c can work with. Callers work with real action objects that have the common struct plus their module specific struct members. You go from a common object to the higher level one using a "to_foo()" macro which makes use of container_of() to do the dirty work. This also kills off act_generic.h which was only used by act_simple.c and keeping it around was more work than the it's value. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicastsThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: rbtree cleanupStephen Hemminger
Add code to initialize rb tree nodes, and check for double deletion. This is not a real fix, but I can make it trap sometimes and may be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: Use hlist for hash lists.Stephen Hemminger
Use hlist instead of list for the hash list. This saves space, and we can check for double delete better. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: LindentStephen Hemminger
Code was a mess in terms of indentation. Run through Lindent script, and cleanup the damage. Also, don't use, vim magic comment, and substitute inline for __inline__. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: HTB_HYSTERESIS cleanupStephen Hemminger
Change the conditional compilation around HTB_HYSTERSIS since code was splitting mid expression. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: Remove lock macro.Stephen Hemminger
Get rid of the macro's being used to obscure the locking. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: Remove broken debug code.Stephen Hemminger
The HTB network scheduler had debug code that wouldn't compile and confused and obfuscated the code, remove it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETEPatrick McHardy
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-18[NET]: Drop tx lock in dev_watchdog_upHerbert Xu
Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM. On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote: > > ======================================================= > [ INFO: possible circular locking dependency detected ] > ------------------------------------------------------- > swapper/0 is trying to acquire lock: > (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290 > > but task is already holding lock: > (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290 > > which lock already depends on the new lock. This turns out to be a genuine bug. The queue lock and xmit lock are intentionally taken out of order. Two things are supposed to prevent dead-locks from occuring: 1) When we hold the queue_lock we're supposed to only do try_lock on the tx_lock. 2) We always drop the queue_lock after taking the tx_lock and before doing anything else. > > the existing dependency chain (in reverse order) is: > > -> #1 (&dev->_xmit_lock){-+..}: > [<c012e7b6>] lock_acquire+0x76/0xa0 > [<c0336241>] _spin_lock_bh+0x31/0x40 > [<c02d25a9>] dev_activate+0x69/0x120 This path obviously breaks assumption 1) and therefore can lead to ABBA dead-locks. I've looked at the history and there seems to be no reason for the lock to be held at all in dev_watchdog_up. The lock appeared in day one and even there it was unnecessary. In fact, people added __dev_watchdog_up precisely in order to get around the tx lock there. The function dev_watchdog_up is already serialised by rtnl_lock since its only caller dev_activate is always called under it. So here is a simple patch to remove the tx lock from dev_watchdog_up. In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and replace it with dev_watchdog_up. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[PKT_SCHED] cls_u32: Fix typo.Ralf Hildebrandt
Signed-off-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04[PKT_SCHED]: Return ENOENT if qdisc module is unavailableJamal Hadi Salim
Return ENOENT if qdisc module is unavailable Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[NET]: Conversions from kmalloc+memset to k(z|c)alloc.Panagiotis Issaris
Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[PKT_SCHED] netem: Fix slab corruption with netem (2nd try)Guillaume Chazarain
CONFIG_DEBUG_SLAB found the following bug: netem_enqueue() in sch_netem.c gets a pointer inside a slab object: struct netem_skb_cb *cb = (struct netem_skb_cb *)skb->cb; But then, the slab object may be freed: skb = skb_unshare(skb, GFP_ATOMIC) cb is still pointing inside the freed skb, so here is a patch to initialize cb later, and make it clear that initializing it sooner is a bad idea. [From Stephen Hemminger: leave cb unitialized in order to let gcc complain in case of use before initialization] Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-15[PATCH] sch_htb compile fix.Dave Jones
net/sched/sch_htb.c: In function 'htb_change_class': net/sched/sch_htb.c:1605: error: expected ';' before 'do_gettimeofday' Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14[PKT_SCHED] HTB: initialize upper bound properlyStephen Hemminger
The upper bound for HTB time diff needs to be scaled to PSCHED units rather than just assuming usecs. The field mbuffer is used in TDIFF_SAFE(), as an upper bound. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-12[MAINTAINERS]: Add proper entry for TC classifierStephen Hemminger
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-09[PKT_SCHED]: act_api: Fix module leak while flushing actionsThomas Graf
Module reference needs to be given back if message header construction fails. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05[PKT_SCHED]: Fix error handling while dumping actionsThomas Graf
"return -err" and blindly inheriting the error code in the netlink failure exception handler causes errors codes to be returned as positive value therefore making them being ignored by the caller. May lead to sending out incomplete netlink messages. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05[PKT_SCHED]: Return ENOENT if action module is unavailableThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05[PKT_SCHED]: Fix illegal memory dereferences when dumping actionsThomas Graf
The TCA_ACT_KIND attribute is used without checking its availability when dumping actions therefore leading to a value of 0x4 being dereferenced. The use of strcmp() in tc_lookup_action_n() isn't safe when fed with string from an attribute without enforcing proper NUL termination. Both bugs can be triggered with malformed netlink message and don't require any privileges. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30Kconfig: Typos in net/sched/KconfigMatt LaPlante
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-23[NET]: Add generic segmentation offloadHerbert Xu
This patch adds the infrastructure for generic segmentation offload. The idea is to tap into the potential savings of TSO without hardware support by postponing the allocation of segmented skb's until just before the entry point into the NIC driver. The same structure can be used to support software IPv6 TSO, as well as UFO and segmentation offload for other relevant protocols, e.g., DCCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Prevent transmission after dev_deactivateHerbert Xu
The dev_deactivate function has bit-rotted since the introduction of lockless drivers. In particular, the spin_unlock_wait call at the end has no effect on the xmit routine of lockless drivers. With a little bit of work, we can make it much more useful by providing the guarantee that when it returns, no more calls to the xmit routine of the underlying driver will be made. The idea is simple. There are two entry points in to the xmit routine. The first comes from dev_queue_xmit. That one is easily stopped by using synchronize_rcu. This works because we set the qdisc to noop_qdisc before the synchronize_rcu call. That in turn causes all subsequent packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call also ensures all outstanding calls leave their critical section. The other entry point is from qdisc_run. Since we now have a bit that indicates whether it's running, all we have to do is to wait until the bit is off. I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is useless because netif_wake_queue can cause it to be set again. It is also harmless because we've disarmed qdisc_run. I've also removed the spin_unlock_wait on xmit_lock because its only purpose of making sure that all outstanding xmit_lock holders have exited is also given by dev_watchdog_down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-19[NET]: Prevent multiple qdisc runsHerbert Xu
Having two or more qdisc_run's contend against each other is bad because it can induce packet reordering if the packets have to be requeued. It appears that this is an unintended consequence of relinquinshing the queue lock while transmitting. That in turn is needed for devices that spend a lot of time in their transmit routine. There are no advantages to be had as devices with queues are inherently single-threaded (the loopback device is not but then it doesn't have a queue). Even if you were to add a queue to a parallel virtual device (e.g., bolt a tbf filter in front of an ipip tunnel device), you would still want to process the queue in sequence to ensure that the packets are ordered correctly. The solution here is to steal a bit from net_device to prevent this. BTW, as qdisc_restart is no longer used by anyone as a module inside the kernel (IIRC it used to with netif_wake_queue), I have not exported the new __qdisc_run function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>