linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2020-04-15	net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge	Vladimir Oltean
	To rehash a previous explanation given in commit 1c44ce560b4d ("net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up"), the switch driver operates the in a mode where a single VLAN can be transmitted as untagged on a particular egress port. That is the "native VLAN on trunk port" use case. The configuration for this native VLAN is driven in 2 ways: - Set the egress port rewriter to strip the VLAN tag for the native VID (as it is egress-untagged, after all). - Configure the ingress port to drop untagged and priority-tagged traffic, if there is no native VLAN. The intention of this setting is that a trunk port with no native VLAN should not accept untagged traffic. Since both of the above configurations for the native VLAN should only be done if VLAN awareness is requested, they are actually done from the ocelot_port_vlan_filtering function, after the basic procedure of toggling the VLAN awareness flag of the port. But there's a problem with that simplistic approach: we are trying to juggle with 2 independent variables from a single function: - Native VLAN of the port - its value is held in port->vid. - VLAN awareness state of the port - currently there are some issues here, more on that later. The actual problem can be seen when enslaving the switch ports to a VLAN filtering bridge: 0. The driver configures a pvid of zero for each port, when in standalone mode. While the bridge configures a default_pvid of 1 for each port that gets added as a slave to it. 1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true. The VLAN-filtering-dependent portion of the native VLAN configuration is done, considering that the native VLAN is 0. 2. The bridge calls ocelot_vlan_add with vid=1, pvid=true, untagged=true. The native VLAN changes to 1 (change which gets propagated to hardware). 3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the VLAN-filtering-dependent portion of the native VLAN configuration, for the new native VLAN of 1. One can notice that after toggling "ip link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0 type bridge vlan_filtering 1", the new native VLAN finally makes it through and untagged traffic finally starts flowing again. But obviously that shouldn't be needed. So it is clear that 2 independent variables need to both re-trigger the native VLAN configuration. So we introduce the second variable as ocelot_port->vlan_aware. Actually both the DSA Felix driver and the Ocelot driver already had each its own variable: - Ocelot: ocelot_port_private->vlan_aware - Felix: dsa_port->vlan_filtering but the common Ocelot library needs to work with a single, common, variable, so there is some refactoring done to move the vlan_aware property from the private structure into the common ocelot_port structure. Fixes: 97bb69e1e36e ("net: mscc: ocelot: break apart ocelot_vlan_port_apply") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-15	Merge tag 'mac80211-for-net-2020-04-15' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of fixes: * FTM responder policy netlink validation fix (but the only user validates again later) * kernel-doc fixes * a fix for a race in mac80211 radio registration vs. userspace * a mesh channel switch fix * a fix for a syzbot reported kasprintf() issue ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	net: marvell10g: soft-reset the PHY when coming out of low power	Russell King
	Soft-reset the PHY when coming out of low power mode, which seems to be necessary with firmware versions 0.3.3.0 and 0.3.10.0. This depends on ("net: marvell10g: report firmware version") Fixes: c9cc1c815d36 ("net: phy: marvell10g: place in powersave mode at probe") Reported-by: Matteo Croce <mcroce@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	net: marvell10g: report firmware version	Russell King
	Report the firmware version when probing the PHY to allow issues attributable to firmware to be diagnosed. Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	net/cxgb4: Check the return from t4_query_params properly	Jason Gunthorpe
	Positive return values are also failures that don't set val, although this probably can't happen. Fixes gcc 10 warning: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function ‘t4_phy_fw_ver’: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:3747:14: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized] 3747 \| *phy_fw_ver = val; Fixes: 01b6961410b7 ("cxgb4: Add PHY firmware support for T420-BT cards") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	net: stmmac: socfpga: Allow all RGMII modes	Atsushi Nemoto
	Allow all the RGMII modes to be used. (Not only "rgmii", "rgmii-id" but "rgmii-txid", "rgmii-rxid") Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	net: dsa: mv88e6xxx: Configure MAC when using fixed link	Andrew Lunn
	The 88e6185 is reporting it has detected a PHY, when a port is connected to an SFP. As a result, the fixed-phy configuration is not being applied. That then breaks packet transfer, since the port is reported as being down. Add additional conditions to check the interface mode, and if it is fixed always configure the port on link up/down, independent of the PPU status. Fixes: 30c4a5b0aad8 ("net: mv88e6xxx: use resolved link config in mac_link_up()") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	ionic: fix unused assignment	Shannon Nelson
	Remove an unused initialized value. Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	ionic: add dynamic_debug header	Shannon Nelson
	Add the appropriate header for using dynamic_hex_dump(), which seems to be incidentally included in some configurations but not all. Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	net: phy: micrel: use genphy_read_status for KSZ9131	Atsushi Nemoto
	KSZ9131 will not work with some switches due to workaround for KSZ9031 introduced in commit d2fd719bcb0e83cb39cfee22ee800f98a56eceb3 ("net/phy: micrel: Add workaround for bad autoneg"). Use genphy_read_status instead of dedicated ksz9031_read_status. Fixes: bff5b4b37372 ("net: phy: micrel: add Microchip KSZ9131 initial driver") Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	Merge tag 'wireless-drivers-2020-04-14' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.7 First set of fixes for v5.6. Fixes for a crash and for two compiler warnings. brcmfmac * fix a crash related to monitor interface ath11k * fix compiler warnings without CONFIG_THERMAL rtw88 * fix compiler warnings related to suspend and resume functions ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14	rtw88: avoid unused function warnings	Arnd Bergmann
	The rtw88 driver defines emtpy functions with multiple indirections but gets one of these wrong: drivers/net/wireless/realtek/rtw88/pci.c:1347:12: error: 'rtw_pci_resume' defined but not used [-Werror=unused-function] 1347 \| static int rtw_pci_resume(struct device dev) \| ^~~~~~~~~~~~~~ drivers/net/wireless/realtek/rtw88/pci.c:1342:12: error: 'rtw_pci_suspend' defined but not used [-Werror=unused-function] 1342 \| static int rtw_pci_suspend(struct device dev) Better simplify it to rely on the conditional reference in SIMPLE_DEV_PM_OPS(), and mark the functions as __maybe_unused to avoid warning about it. I'm not sure if these are needed at all given that the functions don't do anything, but they were only recently added. Fixes: 44bc17f7f5b3 ("rtw88: support wowlan feature for 8822c") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200408185413.218643-1-arnd@arndb.de
2020-04-14	mac80211_hwsim: Use kstrndup() in place of kasprintf()	Tuomas Tynkkynen
	syzbot reports a warning: precision 33020 too large WARNING: CPU: 0 PID: 9618 at lib/vsprintf.c:2471 set_precision+0x150/0x180 lib/vsprintf.c:2471 vsnprintf+0xa7b/0x19a0 lib/vsprintf.c:2547 kvasprintf+0xb2/0x170 lib/kasprintf.c:22 kasprintf+0xbb/0xf0 lib/kasprintf.c:59 hwsim_del_radio_nl+0x63a/0x7e0 drivers/net/wireless/mac80211_hwsim.c:3625 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] ... entry_SYSCALL_64_after_hwframe+0x49/0xbe Thus it seems that kasprintf() with "%.*s" format can not be used for duplicating a string with arbitrary length. Replace it with kstrndup(). Note that later this string is limited to NL80211_WIPHY_NAME_MAXLEN == 64, but the code is simpler this way. Reported-by: syzbot+6693adf1698864d21734@syzkaller.appspotmail.com Reported-by: syzbot+a4aee3f42d7584d76761@syzkaller.appspotmail.com Cc: stable@kernel.org Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> Link: https://lore.kernel.org/r/20200410123257.14559-1-tuomas.tynkkynen@iki.fi [johannes: add note about length limit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-04-12	net: mvneta: Fix a typo	Christophe JAILLET
	s/mvmeta/mvneta/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-12	net: tun: record RX queue in skb before do_xdp_generic()	Gilberto Bertin
	This allows netif_receive_generic_xdp() to correctly determine the RX queue from which the skb is coming, so that the context passed to the XDP program will contain the correct RX queue index. Signed-off-by: Gilberto Bertin <me@jibi.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-12	net: ethernet: ti: Add missing '\n' in log messages	Christophe JAILLET
	Message logged by 'dev_xxx()' or 'pr_xxx()' should end with a '\n'. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-04-12	soc: qcom: ipa: Add a missing '\n' in a log message	Christophe JAILLET
	Message logged by 'dev_xxx()' or 'pr_xxx()' should end with a '\n'. Fixes: a646d6ec9098 ("soc: qcom: ipa: modem and microcontroller") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-04-12	net: neterion: remove redundant assignment to variable tmp64	Colin Ian King
	The variable tmp64 is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-04-11	net: phy: marvell: Fix pause frame negotiation	Clemens Gruber
	The negotiation of flow control / pause frame modes was broken since commit fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") moved the setting of phydev->duplex below the phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that function, phydev->pause was no longer set. Fix it by moving the parsing of the status variable before the blocks dealing with the pause frames. As the Marvell 88E1510 datasheet does not specify the timing between the link status and the "Speed and Duplex Resolved" bit, we have to force the link down as long as the resolved bit is not set, to avoid reporting link up before we even have valid Speed/Duplex. Tested with a Marvell 88E1510 (RGMII to Copper/1000Base-T) Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-04-09	net: macsec: fix using wrong structure in macsec_changelink()	Taehee Yoo
	In the macsec_changelink(), "struct macsec_tx_sa tx_sc" is used to store "macsec_secy.tx_sc". But, the struct type of tx_sc is macsec_tx_sc, not macsec_tx_sa. So, the macsec_tx_sc should be used instead. Test commands: ip link add dummy0 type dummy ip link add macsec0 link dummy0 type macsec ip link set macsec0 type macsec encrypt off Splat looks like: [61119.963483][ T9335] ================================================================== [61119.964709][ T9335] BUG: KASAN: slab-out-of-bounds in macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.965787][ T9335] Read of size 160 at addr ffff888020d69c68 by task ip/9335 [61119.966699][ T9335] [61119.966979][ T9335] CPU: 0 PID: 9335 Comm: ip Not tainted 5.6.0+ #503 [61119.967791][ T9335] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [61119.968914][ T9335] Call Trace: [61119.969324][ T9335] dump_stack+0x96/0xdb [61119.969809][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.970554][ T9335] print_address_description.constprop.5+0x1be/0x360 [61119.971294][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.971973][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.972703][ T9335] __kasan_report+0x12a/0x170 [61119.973323][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.973942][ T9335] kasan_report+0xe/0x20 [61119.974397][ T9335] check_memory_region+0x149/0x1a0 [61119.974866][ T9335] memcpy+0x1f/0x50 [61119.975209][ T9335] macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.975825][ T9335] ? macsec_get_stats64+0x3e0/0x3e0 [macsec] [61119.976451][ T9335] ? kernel_text_address+0x111/0x120 [61119.976990][ T9335] ? pskb_expand_head+0x25f/0xe10 [61119.977503][ T9335] ? stack_trace_save+0x82/0xb0 [61119.977986][ T9335] ? memset+0x1f/0x40 [61119.978397][ T9335] ? __nla_validate_parse+0x98/0x1ab0 [61119.978936][ T9335] ? macsec_alloc_tfm+0x90/0x90 [macsec] [61119.979511][ T9335] ? __kasan_slab_free+0x111/0x150 [61119.980021][ T9335] ? kfree+0xce/0x2f0 [61119.980700][ T9335] ? netlink_trim+0x196/0x1f0 [61119.981420][ T9335] ? nla_memcpy+0x90/0x90 [61119.982036][ T9335] ? register_lock_class+0x19e0/0x19e0 [61119.982776][ T9335] ? memcpy+0x34/0x50 [61119.983327][ T9335] __rtnl_newlink+0x922/0x1270 [ ... ] Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-08	net/mlx5e: CT: Use rhashtable's ct entries instead of a separate list	Paul Blakey
	Fixes CT entries list corruption. After allowing parallel insertion/removals in upper nf flow table layer, unprotected ct entries list can be corrupted by parallel add/del on the same flow table. CT entries list is only used while freeing a ct zone flow table to go over all the ct entries offloaded on that zone/table, and flush the table. As rhashtable already provides an api to go over all the inserted entries, fix the race by using the rhashtable iteration instead, and remove the list. Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command") Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5e: Fix devlink port netdev unregistration sequence	Parav Pandit
	In cited commit netdevice is registered after devlink port. Unregistration flow should be mirror sequence of registration flow. Hence, unregister netdevice before devlink port. Fixes: 31e87b39ba9d ("net/mlx5e: Fix devlink port register sequence") Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5e: Fix pfnum in devlink port attribute	Parav Pandit
	Cited patch missed to extract PCI pf number accurately for PF and VF port flavour. It considered PCI device + function number. Due to this, device having non zero device number shown large pfnum. Hence, use only PCI function number; to avoid similar errors, derive pfnum one time for all port flavours. Fixes: f60f315d339e ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs") Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5e: Fix missing pedit action after ct clear action	Roi Dayan
	With ct clear action we should not allocate the action in hw and not release the mod_acts parsed in advance. It will be done when handling the ct clear action. Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5e: Fix nest_level for vlan pop action	Dmytro Linkin
	Current value of nest_level, assigned from net_device lower_level value, does not reflect the actual number of vlan headers, needed to pop. For ex., if we have untagged ingress traffic sended over vlan devices, instead of one pop action, driver will perform two pop actions. To fix that, calculate nest_level as difference between vlan device and parent device lower_levels. Fixes: f3b0a18bb6cb ("net: remove unnecessary variables and callback") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5e: Add missing release firmware call	Eran Ben Elisha
	Once driver finishes flashing the firmware image, it should release it. Fixes: 9c8bca2637b8 ("mlx5: Move firmware flash implementation to devlink") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5: Fix condition for termination table cleanup	Eli Cohen
	When we destroy rules from slow path we need to avoid destroying termination tables since termination tables are never created in slow path. By doing so we avoid destroying the termination table created for the slow path. Fixes: d8a2034f152a ("net/mlx5: Don't use termination tables in slow path") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	net/mlx5: Fix frequent ioread PCI access during recovery	Moshe Shemesh
	High frequency of PCI ioread calls during recovery flow may cause the following trace on powerpc: [ 248.670288] EEH: 2100000 reads ignored for recovering device at location=Slot1 driver=mlx5_core pci addr=0000:01:00.1 [ 248.670331] EEH: Might be infinite loop in mlx5_core driver [ 248.670361] CPU: 2 PID: 35247 Comm: kworker/u192:11 Kdump: loaded Tainted: G OE ------------ 4.14.0-115.14.1.el7a.ppc64le #1 [ 248.670425] Workqueue: mlx5_health0000:01:00.1 health_recover_work [mlx5_core] [ 248.670471] Call Trace: [ 248.670492] [c00020391c11b960] [c000000000c217ac] dump_stack+0xb0/0xf4 (unreliable) [ 248.670548] [c00020391c11b9a0] [c000000000045818] eeh_check_failure+0x5c8/0x630 [ 248.670631] [c00020391c11ba50] [c00000000068fce4] ioread32be+0x114/0x1c0 [ 248.670692] [c00020391c11bac0] [c00800000dd8b400] mlx5_error_sw_reset+0x160/0x510 [mlx5_core] [ 248.670752] [c00020391c11bb60] [c00800000dd75824] mlx5_disable_device+0x34/0x1d0 [mlx5_core] [ 248.670822] [c00020391c11bbe0] [c00800000dd8affc] health_recover_work+0x11c/0x3c0 [mlx5_core] [ 248.670891] [c00020391c11bc80] [c000000000164fcc] process_one_work+0x1bc/0x5f0 [ 248.670955] [c00020391c11bd20] [c000000000167f8c] worker_thread+0xac/0x6b0 [ 248.671015] [c00020391c11bdc0] [c000000000171618] kthread+0x168/0x1b0 [ 248.671067] [c00020391c11be30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 Reduce the PCI ioread frequency during recovery by using msleep() instead of cond_resched() Fixes: 3e5b72ac2f29 ("net/mlx5: Issue SW reset on FW assert") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08	ionic: set station addr only if needed	Shannon Nelson
	The code was working too hard and in some cases was trying to delete filters that weren't there, generating a potentially misleading error message. IONIC_CMD_RX_FILTER_DEL (32) failed: IONIC_RC_ENOENT (-2) Fixes: 2a654540be10 ("ionic: Add Rx filter and rx_mode ndo support") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-08	ionic: replay filters after fw upgrade	Shannon Nelson
	The NIC's filters are lost in the midst of the fw-upgrade so we need to replay them into the FW. We also remove the unused ionic_rx_filter_del() function. Fixes: c672412f6172 ("ionic: remove lifs on fw reset") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-07	Documentation: mdio_bus.c - fix warnings	Lothar Rubusch
	Fix wrong parameter description and related warnings at 'make htmldocs'. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-07	net: ethernet: mediatek: move mt7623 settings out off the mt7530	René van Dorst
	Moving mt7623 logic out off mt7530, is required to make hardware setting consistent after we introduce phylink to mtk driver. Fixes: b8fc9f30821e ("net: ethernet: mediatek: Add basic PHYLINK support") Reviewed-by: Sean Wang <sean.wang@mediatek.com> Tested-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: René van Dorst <opensource@vdorst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-07	net: dsa: mt7530: move mt7623 settings out off the mt7530	René van Dorst
	Moving mt7623 logic out off mt7530, is required to make hardware setting consistent after we introduce phylink to mtk driver. Fixes: ca366d6c889b ("net: dsa: mt7530: Convert to PHYLINK API") Reviewed-by: Sean Wang <sean.wang@mediatek.com> Tested-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: René van Dorst <opensource@vdorst.com> Tested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-07	net: fec: set GPR bit on suspend by DT configuration.	Martin Fuzzey
	On some SoCs, such as the i.MX6, it is necessary to set a bit in the SoC level GPR register before suspending for wake on lan to work. The fec platform callback sleep_mode_enable was intended to allow this but the platform implementation was NAK'd back in 2015 [1] This means that, currently, wake on lan is broken on mainline for the i.MX6 at least. So implement the required bit setting in the fec driver by itself by adding a new optional DT property indicating the GPR register and adding the offset and bit information to the driver. [1] https://www.spinics.net/lists/netdev/msg310922.html Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-07	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge more updates from Andrew Morton: - a lot more of MM, quite a bit more yet to come: (memcg, pagemap, vmalloc, pagealloc, migration, thp, ksm, madvise, virtio, userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups) - various other subsystems (procfs, misc, MAINTAINERS, bitops, lib, checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig, ubsan, fault-injection, ipc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (158 commits) ipc/shm.c: make compat_ksys_shmctl() static ipc/mqueue.c: fix a brace coding style issue lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability" ubsan: include bug type in report header kasan: unset panic_on_warn before calling panic() ubsan: check panic_on_warn drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks ubsan: split "bounds" checker from other options ubsan: add trap instrumentation option init/Kconfig: clean up ANON_INODES and old IO schedulers options kernel/gcov/fs.c: replace zero-length array with flexible-array member gcov: gcc_3_4: replace zero-length array with flexible-array member gcov: gcc_4_7: replace zero-length array with flexible-array member kernel/kmod.c: fix a typo "assuems" -> "assumes" reiserfs: clean up several indentation issues kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol() samples/hw_breakpoint: drop use of kallsyms_lookup_name() samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path fs/binfmt_elf.c: allocate less for static executable ...
2020-04-07	Merge tag 'for-linus-5.7-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: - Fix for memory leaks around UBIFS orphan handling - Fix for memory leaks around UBI fastmap - Remove zero-length array from ubi-media.h - Fix for TNC lookup in UBIFS orphan code * tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: ubi-media.h: Replace zero-length array with flexible-array member ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len ubi: fastmap: Only produce the initial anchor PEB when fastmap is used ubi: fastmap: Free unused fastmap anchor peb during detach ubifs: ubifs_add_orphan: Fix a memory leak bug ubifs: ubifs_jnl_write_inode: Fix a memory leak bug ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()
2020-04-07	Merge branch 'parisc-5.7-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "Some cleanups in arch_rw locking functions, improved interrupt handling in arch spinlocks, coversions to request_irq() and syscall table generation cleanups" * 'parisc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: remove nargs from __SYSCALL parisc: Refactor alternative code to accept multiple conditions parisc: Rework arch_rw locking functions parisc: Improve interrupt handling in arch_spin_lock_flags() parisc: Replace setup_irq() by request_irq()
2020-04-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide	Linus Torvalds
	Pull IDE update from David Miller: "As usual, very quiet in this subsystem. Just a list_for_each_entry_safe() conversion" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide: drivers/ide: Fix build regression. drivers/ide: convert to list_for_each_entry_safe()
2020-04-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Slave bond and team devices should not be assigned ipv6 link local addresses, from Jarod Wilson. 2) Fix clock sink config on some at803x PHY devices, from Oleksij Rempel. 3) Uninitialized stack space transmitted in slcan frames, fix from Richard Palethorpe. 4) Guard HW VLAN ops properly in stmmac driver, from Jose Abreu. 5) "=" --> "\|=" fix in aquantia driver, from Colin Ian King. 6) Fix TCP fallback in mptcp, from Florian Westphal. (accessing a plain tcp_sk as if it were an mptcp socket). 7) Fix cavium driver in some configurations wrt. PTP, from Yue Haibing. 8) Make ipv6 and ipv4 consistent in the lower bound allowed for neighbour entry retrans_time, from Hangbin Liu. 9) Don't use private workqueue in pegasus usb driver, from Petko Manolov. 10) Fix integer overflow in mlxsw, from Colin Ian King. 11) Missing refcnt init in cls_tcindex, from Cong Wang. 12) One too many loop iterations when processing cmpri entries in ipv6 rpl code, from Alexander Aring. 13) Disable SG and TSO by default in r8169, from Heiner Kallweit. 14) NULL deref in macsec, from Davide Caratti. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits) macsec: fix NULL dereference in macsec_upd_offload() skbuff.h: Improve the checksum related comments net: dsa: bcm_sf2: Ensure correct sub-node is parsed qed: remove redundant assignment to variable 'rc' wimax: remove some redundant assignments to variable result mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_PRIORITY r8169: change back SG and TSO to be disabled by default net: dsa: bcm_sf2: Do not register slave MDIO bus with OF ipv6: rpl: fix loop iteration tun: Don't put_page() for all negative return values from XDP program net: dsa: mt7530: fix null pointer dereferencing in port5 setup mptcp: add some missing pr_fmt defines net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers net_sched: fix a missing refcnt in tcindex_init() net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting mlxsw: spectrum_trap: fix unintention integer overflow on left shift pegasus: Remove pegasus' own workqueue neigh: support smaller retrans_time settting net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry ...
2020-04-07	Merge branch 'pcmcia-next' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull pcmcia updates from Dominik Brodowski: "A few PCMCIA odd fixes: removing a few spaces and useless casts, replacing snprintf() with scnprintf(), and replacing zero-length arrays with a flexible-array member" * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: remove some unused space characters pcmcia: soc_common.h: Replace zero-length array with flexible-array member pcmcia: cs_internal.h: Replace zero-length array with flexible-array member pcmcia: Use scnprintf() for avoiding potential buffer overflow pcmcia: omap: remove useless cast for driver.name
2020-04-07	drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks	Kees Cook
	Adds LKDTM tests for arithmetic overflow (both signed and unsigned), as well as array bounds checking. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Elena Petrova <lenaptr@google.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Link: http://lkml.kernel.org/r/20200227193516.32566-4-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	mm/memory_hotplug: allow to specify a default online_type	David Hildenbrand
	For now, distributions implement advanced udev rules to essentially - Don't online any hotplugged memory (s390x) - Online all memory to ZONE_NORMAL (e.g., most virt environments like hyperv) - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken care of (e.g., bare metal, special virt environments) In summary: All memory is usually onlined the same way, however, the kernel always has to ask user space to come up with the same answer. E.g., Hyper-V always waits for a memory block to get onlined before continuing, otherwise it might end up adding memory faster than onlining it, which can result in strange OOM situations. This waiting slows down adding of a bigger amount of memory. Let's allow to specify a default online_type, not just "online" and "offline". This allows distributions to configure the default online_type when booting up and be done with it. We can now specify "offline", "online", "online_movable" and "online_kernel" via - "memhp_default_state=" on the kernel cmdline - /sys/devices/system/memory/auto_online_blocks just like we are able to specify for a single memory block via /sys/devices/system/memory/memoryX/state Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-9-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	mm/memory_hotplug: convert memhp_auto_online to store an online_type	David Hildenbrand
	... and rename it to memhp_default_online_type. This is a preparation for more detailed default online behavior. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-8-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	hv_balloon: don't check for memhp_auto_online manually	David Hildenbrand
	We get the MEM_ONLINE notifier call if memory is added right from the kernel via add_memory() or later from user space. Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt mechanism (->done) for that. Initialize the wait event only once and reinitialize before adding memory. Unconditionally call complete() and wait_for_completion_timeout(). If there are no waiters, complete() will only increment ->done - which will be reset by reinit_completion(). If complete() has already been called, wait_for_completion_timeout() will not wait. There is still the chance for a small race between concurrent reinit_completion() and complete(). If complete() wins, we would not wait - which is tolerable (and the race exists in current code as well). Note: We only wait for "some" memory to get onlined, which seems to be good enough for now. [akpm@linux-foundation.org: register_memory_notifier() after init_completion(), per David] Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-6-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	drivers/base/memory: store mapping between MMOP_* and string in an array	David Hildenbrand
	Let's use a simple array which we can reuse soon. While at it, move the string->mmop conversion out of the device hotplug lock. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-4-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	drivers/base/memory: map MMOP_OFFLINE to 0	David Hildenbrand
	Historically, we used the value -1. Just treat 0 as the special case now. Clarify a comment (which was wrong, when we come via device_online() the first time, the online_type would have been 0 / MEM_ONLINE). The default is now always MMOP_OFFLINE. This removes the last user of the manual "-1", which didn't use the enum value. This is a preparation to use the online_type as an array index. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-3-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE	David Hildenbrand
	Patch series "mm/memory_hotplug: allow to specify a default online_type", v3. Distributions nowadays use udev rules ([1] [2]) to specify if and how to online hotplugged memory. The rules seem to get more complex with many special cases. Due to the various special cases, CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug is handled via udev rules. Every time we hotplug memory, the udev rule will come to the same conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of memory in separate memory blocks and wait for memory to get onlined by user space before continuing to add more memory blocks (to not add memory faster than it is getting onlined). This of course slows down the whole memory hotplug process. To make the job of distributions easier and to avoid udev rules that get more and more complicated, let's extend the mechanism provided by - /sys/devices/system/memory/auto_online_blocks - "memhp_default_state=" on the kernel cmdline to be able to specify also "online_movable" as well as "online_kernel" === Example /usr/libexec/config-memhotplug === #!/bin/bash VIRT=`systemd-detect-virt --vm` ARCH=`uname -p` sense_virtio_mem() { if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l \| wc -l` if [ $DEVICES != "0" ]; then return 0 fi fi return 1 } if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then echo "Memory hotplug configuration support missing in the kernel" exit 1 fi if grep "memhp_default_state=" /proc/cmdline > /dev/null; then echo "Memory hotplug configuration overridden in kernel cmdline (memhp_default_state=)" exit 1 fi if [ $VIRT == "microsoft" ]; then echo "Detected Hyper-V on $ARCH" # Hyper-V wants all memory in ZONE_NORMAL ONLINE_TYPE="online_kernel" elif sense_virtio_mem; then echo "Detected virtio-mem on $ARCH" # virtio-mem wants all memory in ZONE_NORMAL ONLINE_TYPE="online_kernel" elif [ $ARCH == "s390x" ] \|\| [ $ARCH == "s390" ]; then echo "Detected $ARCH" # standby memory should not be onlined automatically ONLINE_TYPE="offline" elif [ $ARCH == "ppc64" ] \|\| [ $ARCH == "ppc64le" ]; then echo "Detected" $ARCH # PPC64 onlines all hotplugged memory right from the kernel ONLINE_TYPE="offline" elif [ $VIRT == "none" ]; then echo "Detected bare-metal on $ARCH" # Bare metal users expect hotplugged memory to be unpluggable. We assume # that ZONE imbalances on such enterpise servers cannot happen and is # properly documented ONLINE_TYPE="online_movable" else # TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE # imbalances won't happen echo "Detected $VIRT on $ARCH" # Usually, ballooning is used in virtual environments, so memory should go to # ZONE_NORMAL. However, sometimes "movable_node" is relevant. ONLINE_TYPE="online" fi echo "Selected online_type:" $ONLINE_TYPE # Configure what to do with memory that will be hotplugged in the future echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks if [ $? != "0" ]; then echo "Memory hotplug cannot be configured (e.g., old kernel or missing permissions)" # A backup udev rule should handle old kernels if necessary exit 1 fi # Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or virtio-mem) if [ $ONLINE_TYPE != "offline" ]; then for MEMORY in /sys/devices/system/memory/memory; do STATE=`cat $MEMORY/state` if [ $STATE == "offline" ]; then echo $ONLINE_TYPE > $MEMORY/state fi done fi === Example /usr/lib/systemd/system/config-memhotplug.service === [Unit] Description=Configure memory hotplug behavior DefaultDependencies=no Conflicts=shutdown.target Before=sysinit.target shutdown.target After=systemd-modules-load.service ConditionPathExists=\|/sys/devices/system/memory/auto_online_blocks [Service] ExecStart=/usr/libexec/config-memhotplug Type=oneshot TimeoutSec=0 RemainAfterExit=yes [Install] WantedBy=sysinit.target === Example modification to the 40-redhat.rules [2] === : diff --git a/40-redhat.rules b/40-redhat.rules-new : index 2c690e5..168fd03 100644 : --- a/40-redhat.rules : +++ b/40-redhat.rules-new : @@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online} : # Memory hotadd request : SUBSYSTEM!="memory", GOTO="memory_hotplug_end" : ACTION!="add", GOTO="memory_hotplug_end" : +# memory hotplug behavior configured : +PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", GOTO="memory_hotplug_end" : + : PROGRAM="/bin/uname -p", RESULT=="s390", GOTO="memory_hotplug_end" : : ENV{.state}="online" === [1] https://github.com/lnykryn/systemd-rhel/pull/281 [2] https://github.com/lnykryn/systemd-rhel/blob/staging/rules/40-redhat.rules This patch (of 8): The name is misleading and it's not really clear what is "kept". Let's just name it like the online_type name we expose to user space ("online"). Add some documentation to the types. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Yumei Huang <yuhuang@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Paul Mackerras <paulus@samba.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Link: http://lkml.kernel.org/r/20200319131221.14044-1-david@redhat.com Link: http://lkml.kernel.org/r/20200317104942.11178-2-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	drivers/base/memory.c: drop pages_correctly_probed()	David Hildenbrand
	pages_correctly_probed() is a leftover from ancient times. It dates back to commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions"), where Pg_reserved checks were added as a sfety net: /* * The probe routines leave the pages reserved, just * as the bootmem code does. Make sure they're still * that way. */ The checks were refactored quite a bit over the years, especially in commit b77eab7079d9 ("mm/memory_hotplug: optimize probe routine"), where checks for present, valid, and online sections were added. Hotplugged memory is added via add_memory(), which will create the full memmap for the hotplugged memory, and mark all sections valid and present. Only full memory blocks are onlined/offlined, so we also cannot have an inconsistency in that regard (especially, memory blocks with some sections being online and some being offline). 1. Boot memory always starts online. Since commit c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to online/offline memory blocks with holes") we disallow to offline any memory with holes. Therefore, we never online memory with holes. Present and validity checks are superfluous. 2. Only complete memory blocks are onlined/offlined (and especially, the state - online or offline - is stored for whole memory blocks). Besides the core, only arch/powerpc/platforms/powernv/memtrace.c manually calls offline_pages() and fiddels with memory block states. But it also only offlines complete memory blocks. 3. To make any of these conditions trigger, something would have to be terribly messed up in the core. (e.g., online/offline only some sections of a memory block). 4. Memory unplug properly makes sure that all sysfs attributes were removed (and therefore, that all threads left the sysfs handlers). We don't have to worry about zombie devices at this point. 5. The valid_section_nr(section_nr) check is actually dead code, as it would never have been reached due to the WARN_ON_ONCE(!pfn_valid(pfn)). No wonder we haven't seen any of these errors in a long time (or even ever, according to my search). Let's just get rid of them. Now, all checks that could hinder onlining and offlining are completely contained in online_pages()/offline_pages(). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Link: http://lkml.kernel.org/r/20200127110424.5757-3-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	drivers/base/memory.c: drop section_count	David Hildenbrand
	Patch series "mm: drop superfluous section checks when onlining/offlining". Let's drop some superfluous section checks on the onlining/offlining path. This patch (of 3): Since commit c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to online/offline memory blocks with holes") we have a generic check in offline_pages() that disallows offlining memory blocks with holes. Memory blocks with missing sections are just another variant of these type of blocks. We can stop checking (and especially storing) present sections. A proper error message is now printed why offlining failed. section_count was initially introduced in commit 07681215975e ("Driver core: Add section count to memory_block struct") in order to detect when it is okay to remove a memory block. It was used in commit 26bbe7ef6d5c ("drivers/base/memory.c: prohibit offlining of memory blocks with missing sections") to disallow offlining memory blocks with missing sections. As we refactored creation/removal of memory devices and have a proper check for holes in place, we can drop the section_count. This also removes a leftover comment regarding the mem_sysfs_mutex, which was removed in commit 848e19ad3c33 ("drivers/base/memory.c: drop the mem_sysfs_mutex"). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Link: http://lkml.kernel.org/r/20200127110424.5757-2-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07	virtio-balloon: switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM	David Hildenbrand
	Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker") changed the behavior when deflation happens automatically. Instead of deflating when called by the OOM handler, the shrinker is used. However, the balloon is not simply some other slab cache that should be shrunk when under memory pressure. The shrinker does not have a concept of priorities yet, so this behavior cannot be configured. Eventually once that is in place, we might want to switch back after doing proper testing. There was a report that this results in undesired side effects when inflating the balloon to shrink the page cache. [1] "When inflating the balloon against page cache (i.e. no free memory remains) vmscan.c will both shrink page cache, but also invoke the shrinkers -- including the balloon's shrinker. So the balloon driver allocates memory which requires reclaim, vmscan gets this memory by shrinking the balloon, and then the driver adds the memory back to the balloon. Basically a busy no-op." The name "deflate on OOM" makes it pretty clear when deflation should happen - after other approaches to reclaim memory failed, not while reclaiming. This allows to minimize the footprint of a guest - memory will only be taken out of the balloon when really needed. Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because this has no such side effects. Always register the shrinker with VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free pages that are still to be processed by the guest. The hypervisor takes care of identifying and resolving possible races between processing a hinting request and the guest reusing a page. In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker"), don't add a module parameter to configure the number of pages to deflate on OOM. Can be re-added if really needed. Also, pay attention that leak_balloon() returns the number of 4k pages - convert it properly in virtio_balloon_oom_notify(). Testing done by Tyler for future reference: Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42 GB file full of random bytes that we continually cat to /dev/null. This fills the page cache as the file is read. Meanwhile, we trigger the balloon to inflate, with a target size of 53 GB. This setup causes the balloon inflation to pressure the page cache as the page cache is also trying to grow. Afterwards we shrink the balloon back to zero (so total deflate == total inflate). Without this patch (kernel 4.19.0-5): Inflation never reaches the target until we stop the "cat file > /dev/null" process. Total inflation time was 542 seconds. The longest period that made no net forward progress was 315 seconds. Result of "grep balloon /proc/vmstat" after the test: balloon_inflate 154828377 balloon_deflate 154828377 With this patch (kernel 5.6.0-rc4+): Total inflation duration was 63 seconds. No deflate-queue activity occurs when pressuring the page-cache. Result of "grep balloon /proc/vmstat" after the test: balloon_inflate 12968539 balloon_deflate 12968539 Conclusion: This patch fixes the issue. In the test it reduced inflate/deflate activity by 12x, and reduced inflation time by 8.6x. But more importantly, if we hadn't killed the "cat file > /dev/null" process then, without the patch, the inflation process would never reach the target. [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html Link: http://lkml.kernel.org/r/20200311135523.18512-2-david@redhat.com Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Tyler Sanderson <tysand@google.com> Tested-by: Tyler Sanderson <tysand@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Nadav Amit <namit@vmware.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>