aboutsummaryrefslogtreecommitdiff
path: root/drivers/nvdimm
AgeCommit message (Collapse)Author
2016-10-07Merge branch 'for-4.9/dax' into libnvdimm-for-nextDan Williams
2016-10-07Merge branch 'for-4.9/libnvdimm' into libnvdimm-for-nextDan Williams
2016-10-07/dev/dax: fix Kconfig dependency build breakageRoss Zwisler
The function dax_pmem_probe() in drivers/dax/pmem.c is compiled under the CONFIG_DEV_DAX_PMEM tri-state config option. This config option currently only depends on CONFIG_NVDIMM_DAX, a bool, which means that the following configuration is possible: CONFIG_LIBNVDIMM=m ... CONFIG_NVDIMM_DAX=y CONFIG_DEV_DAX=y CONFIG_DEV_DAX_PMEM=y With this config LIBNVDIMM is compiled as a module with NVDIMM_DAX=y just meaning that we will compile drivers/nvdimm/dax_devs.c into that module. However, dax_pmem_probe() depends on several symbols defined in drivers/nvdimm/dax_devs.c, which results in the following build errors: drivers/built-in.o: In function `dax_pmem_probe': linux/drivers/dax/pmem.c:70: undefined reference to `to_nd_dax' linux/drivers/dax/pmem.c:74: undefined reference to `nvdimm_namespace_common_probe' linux/drivers/dax/pmem.c:80: undefined reference to `devm_nsio_enable' linux/drivers/dax/pmem.c:81: undefined reference to `nvdimm_setup_pfn' linux/drivers/dax/pmem.c:84: undefined reference to `devm_nsio_disable' linux/drivers/dax/pmem.c:122: undefined reference to `to_nd_region' drivers/built-in.o: In function `dax_pmem_init': linux/drivers/dax/pmem.c:147: undefined reference to `__nd_driver_register' Fix this by making NVDIMM_DAX a tristate. DEV_DAX_PMEM depends on NVDIMM_DAX which depends on LIBNVDIMM. Since they are all now tristates, if LIBNVDIMM is built as a kernel module DEV_DAX_PMEM will be as well. This prevents dax_devs.c from being built as a built-in while its dependencies are in the libnvdimm.ko module. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: allow creation of multiple pmem-namespaces per regionDan Williams
Similar to BLK regions, publish new seed namespace devices to allow unused PMEM region capacity to be consumed by additional namespaces. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: lift single pmem limit in scan_labels()Dan Williams
Now that the rest of the infrastructure has been converted to handle multi-pmem configurations, lift the artificial barrier at scan time. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: filter out of range labels in scan_labels()Dan Williams
Short-circuit doomed-to-fail label validation attempts by skipping labels that are outside the given region. For example a DIMM that has multiple PMEM regions will waste time attempting to create namespaces only to find that the interleave-set-cookie does not validate, e.g.: nd_region region6: invalid cookie in label: 73e608dc-47b9-4b2a-b5c7-2d55a32e0c2 Similar to how we skip BLK labels when performing PMEM validation we can skip out-of-range labels early. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: enable allocation of multiple pmem namespacesDan Williams
Now that we have nd_region_available_dpa() able to handle the presence of multiple PMEM allocations in aliased PMEM regions, reuse that same infrastructure to track allocations from free space. In particular handle allocating from an aliased PMEM region in the case where there are dis-contiguous holes. The allocation for BLK and PMEM are documented in the space_valid() helper: BLK-space is valid as long as it does not precede a PMEM allocation in a given region. PMEM-space must be contiguous and adjacent to an existing existing allocation (if one exists). Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: update label implementation for multi-pmemDan Williams
Instead of assuming that there will only ever be one allocated range at the start of the region, account for additional namespaces that might start at an offset from the region base. After this change pmem namespaces now have a reason to carry an array of resources similar to blk. Unifying the resource tracking infrastructure in nd_namespace_common is a future cleanup candidate. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: expand pmem device naming scheme for multi-pmemDan Williams
pmem devices are currently named /dev/pmem<region-index>. Preserve the naming of the 0th device, but add a ".<namespace-index>" for other devices. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, region: update nd_region_available_dpa() for multi-pmem supportDan Williams
The free dpa (dimm-physical-address) space calculation reports how much free space is available with consideration for aliased BLK + PMEM regions. Recall that BLK capacity is allocated from high addresses and PMEM is allocated from low addresses in their respective regions. nd_region_available_dpa() accounts for the fact that the largest encroachment (lowest starting address) into PMEM capacity by a BLK allocation limits the available capacity to that point, regardless if there is BLK allocation hole at a higher address. Similarly, for the multi-pmem case we need to track the largest encroachment (highest ending address) of a PMEM allocation in BLK capacity regardless of whether there is an allocation hole that a BLK allocation could fill at a lower address. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: sort namespaces by dpa at initDan Williams
Add more determinism to initial namespace device-name assignments by sorting the namespaces by starting dpa. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07libnvdimm, namespace: allow multiple pmem-namespaces per region at scan timeDan Williams
If label scanning finds multiple valid pmem namespaces allow them to be surfaced rather than fail namespace scanning. Support for creating multiple namespaces per region is saved for a later patch. Note that this adds some new error messages to clarify which of the pmem namespaces in the set are potentially impacted by invalid labels. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-05libnvdimm, namespace: unify blk and pmem label scanningDan Williams
In preparation for allowing multiple namespace per pmem region, unify blk and pmem label scanning. Given that blk regions already support multiple namespaces, teaching that path how to do pmem namespace scanning is an incremental step towards multiple pmem namespace support. This should be functionally equivalent to the previous state in that stops after finding the first valid pmem label set. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-05libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helperDan Williams
The ability to translate a generic struct device pointer into a namespace uuid is a useful utility as we go to unify the blk and pmem label scanning paths. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30libnvdimm, label: convert label tracking to a linked listDan Williams
In preparation for enabling multiple namespaces per pmem region, convert the label tracking to use a linked list. In particular this will allow select_pmem_id() to move labels from the unvalidated state to the validated state. Currently we only track one validated set per-region. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30libnvdimm, region: move region-mapping input-paramters to nd_mapping_descDan Williams
Before we add more libnvdimm-private fields to nd_mapping make it clear which parameters are input vs libnvdimm internals. Use struct nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make struct nd_mapping private to libnvdimm. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30nvdimm: reduce duplicated wpq flushesDave Jiang
Existing implemenetation writes to all the flush hint addresses for a given ND region. This is not necessary as the flushes are per imc and not per DIMM. Search the mappings and clear out the duplicates at init to avoid multiple flush to the same imc. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30libnvdimm: clear the internal poison_list when clearing badblocksVishal Verma
nvdimm_clear_poison cleared the user-visible badblocks, and sent commands to the NVDIMM to clear the areas marked as 'poison', but it neglected to clear the same areas from the internal poison_list which is used to marshal ARS results before sorting them by namespace. As a result, once on-demand ARS functionality was added: 37b137f nfit, libnvdimm: allow an ARS scrub to be triggered on demand A scrub triggered from either sysfs or an MCE was found to be adding stale entries that had been cleared from gendisk->badblocks, but were still present in nvdimm_bus->poison_list. Additionally, the stale entries could be triggered into producing stale disk->badblocks by simply disabling and re-enabling the namespace or region. This adds the missing step of clearing poison_list entries when clearing poison, so that it is always in sync with badblocks. Fixes: 37b137f ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30pmem: reduce kmap_atomic sections to the memcpys onlyVishal Verma
pmem_do_bvec used to kmap_atomic at the begin, and only unmap at the end. Things like nvdimm_clear_poison may want to do nvdimm subsystem bookkeeping operations that may involve taking locks or doing memory allocations, and we can't do that from the atomic context. Reduce the atomic context to just what needs it - the memcpy to/from pmem. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-24libnvdimm, region: fix flush hint table thinkoDan Williams
The definition of the flush hint table as: void __iomem *flush_wpq[0][0]; ...passed the unit test, but is broken as flush_wpq[0][1] and flush_wpq[1][0] refer to the same entry. Fix this to use a helper that calculates a slot in the table based on the geometry of flush hints in the region. This is important to get right since virtualization solutions use this mechanism to trigger hypervisor flushes to platform persistence. Reported-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21nvdimm: remove duplicate nd_mapping declarationDave Jiang
Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21libnvdimm, namespace: debug invalid interleave-set-cookie valuesDan Williams
If platform firmware fails to populate unique / non-zero serial number data for each nvdimm in an interleave-set it may cause pmem region initialization to fail. Add a debug message for this case. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21libnvdimm: fix devm_nvdimm_memremap() error pathDan Williams
The internal alloc_nvdimm_map() helper might fail, particularly if the memory region is already busy. Report request_mem_region() failures and check for the failure. Reported-by: Ryan Chen <ryan.chan105@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-19nvdimm: fix PHYS_PFN/PFN_PHYS mixupOliver O'Halloran
nd_activate_region() iomaps any hint addresses required when activating a region. To prevent duplicate mappings it checks the PFN of the hint to be mapped against the PFNs of the already mapped hints. Unfortunately it doesn't convert the PFN back into a physical address before passing it to devm_nvdimm_ioremap(). Instead it applies PHYS_PFN a second time which ends about as well as you would imagine. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-09libnvdimm: allow legacy (e820) pmem region to clear bad blocksDave Jiang
Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation where legacy pmem is being used or a pmem region created by using memmap kernel parameter, the injected bad blocks are not cleared due to nvdimm_clear_poison() failing from lack of ndctl function pointer. In this case we need to just return as handled and allow the bad blocks to be cleared rather than fail. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-01libnvdimm: Fix nvdimm_probe error on NVDIMM-NToshi Kani
'ndctl list --buses --dimms' does not list any NVDIMM-Ns since they are considered as idle. ndctl checks if any driver is attached to nmem device. nvdimm_probe() always fails in nvdimm_init_nsarea() since NVDIMM-Ns do not implement optinal ND_CMD_GET_CONFIG_DATA command. Change nvdimm_probe() to accept the case that the CONFIG_DATA command is not implemented for NVDIMM-Ns. The driver attaches without ndd, which keeps it no-op to the device. Reported-by: Brian Boylston <brian.boylston@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-01nvdimm: Spelling s/unacknoweldged/unacknowledged/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-29acpi, nfit: add dimm device notification supportDan Williams
Per "ACPI 6.1 Section 9.20.3" NVDIMM devices, children of the ACPI0012 NVDIMM Root device, can receive health event notifications. Given that these devices are precluded from registering a notification handler via acpi_driver.acpi_device_ops (due to no _HID), we use acpi_install_notify_handler() directly. The registered handler, acpi_nvdimm_notify(), triggers a poll(2) event on the nmemX/nfit/flags sysfs attribute when a health event notification is received. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-08nvdimm, btt: add a size attribute for BTTsVishal Verma
To be consistent with other namespaces, expose a 'size' attribute for BTT devices also. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07block/mm: make bdev_ops->rw_page() take a bool for read/writeJens Axboe
Commit abf545484d31 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04mm/block: convert rw_page users to bio op useMike Christie
The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-28Merge tag 'libnvdimm-for-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Replace pcommit with ADR / directed-flushing. The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. - On-demand ARS (address range scrub). Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. - Support for the Microsoft DSM (device specific method) command format. - Support for EDK2/OVMF virtual disk device memory ranges. - Various fixes and cleanups across the subsystem. * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits) libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" nfit: do an ARS scrub on hitting a latent media error nfit: move to nfit/ sub-directory nfit, libnvdimm: allow an ARS scrub to be triggered on demand libnvdimm: register nvdimm_bus devices with an nd_bus driver pmem: clarify a debug print in pmem_clear_poison x86/insn: remove pcommit Revert "KVM: x86: add pcommit support" nfit, tools/testing/nvdimm/: unify shutdown paths libnvdimm: move ->module to struct nvdimm_bus_descriptor nfit: cleanup acpi_nfit_init calling convention nfit: fix _FIT evaluation memory leak + use after free tools/testing/nvdimm: add manufacturing_{date|location} dimm properties tools/testing/nvdimm: add virtual ramdisk range acpi, nfit: treat virtual ramdisk SPA as pmem region pmem: kill __pmem address space pmem: kill wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes fs/dax: remove wmb_pmem() libnvdimm, pmem: flush posted-write queues on shutdown ...
2016-07-26Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...
2016-07-26Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-07-24Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-nextDan Williams
2016-07-24libnvdimm-btt: Delete an unnecessary check before the function call ↵Markus Elfring
"__nd_device_register" The __nd_device_register() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23nfit, libnvdimm: allow an ARS scrub to be triggered on demandVishal Verma
Normally, an ARS (Address Range Scrub) only happens at boot/initialization time. There can however arise situations where a bus-wide rescan is needed - notably, in the case of discovering a latent media error, we should do a full rescan to figure out what other sectors are bad, and thus potentially avoid triggering an mce on them in the future. Also provide a sysfs trigger to start a bus-wide scrub. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23libnvdimm: register nvdimm_bus devices with an nd_bus driverDan Williams
A recent effort to add a new nvdimm bus provider attribute highlighted a race between interrogating nvdimm_bus->nd_desc and nvdimm_bus tear down. The typical way to handle these races is to take the device_lock() in the attribute method and validate that the device is still active. In order for a device to be 'active' it needs to be associated with a driver. So, we create the small boilerplate for a driver and register nvdimm_bus devices on the 'nvdimm_bus_type' bus. A result of this change is that ndbusX devices now appear under /sys/bus/nd/devices. In fact this makes /sys/class/nd somewhat redundant, but removing that will need to take a long deprecation period given its use by ndctl binaries in the field. This change naturally pulls code from drivers/nvdimm/core.c to drivers/nvdimm/bus.c, so it is a nice code organization clean-up as well. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23pmem: clarify a debug print in pmem_clear_poisonVishal Verma
Prefix the sector number being cleared with a '0x' to make it clear that this is a hex value. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21libnvdimm: move ->module to struct nvdimm_bus_descriptorDan Williams
Let the provider module be explicitly passed in rather than implicitly assumed by the module that calls nvdimm_bus_register(). This is in preparation for unifying the nfit and nfit_test driver teardown paths. Reviewed-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-20block: add QUEUE_FLAG_DAX for devices to advertise their DAX supportToshi Kani
Currently, presence of direct_access() in block_device_operations indicates support of DAX on its block device. Because block_device_operations is instantiated with 'const', this DAX capablity may not be enabled conditinally. In preparation for supporting DAX to device-mapper devices, add QUEUE_FLAG_DAX to request_queue flags to advertise their DAX support. This will allow to set the DAX capability based on how mapped device is composed. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <linux-s390@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12pmem: kill __pmem address spaceDan Williams
The __pmem address space was meant to annotate codepaths that touch persistent memory and need to coordinate a call to wmb_pmem(). Now that wmb_pmem() is gone, there is little need to keep this annotation. Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12libnvdimm, pmem: use nvdimm_flush() for namespace I/O writesDan Williams
nsio_rw_bytes() is used to write info block metadata to the namespace, so it should trigger a flush after every write. Replace wmb_pmem() with nvdimm_flush() in this path. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12libnvdimm, pmem: flush posted-write queues on shutdownDan Williams
Commit writes to media on system shutdown or pmem driver unload. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()Dan Williams
Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer chasing through nd_region), and that we otherwise assume a platform has ADR capability when flush hints are not present, move nvdimm_flush() to REQ_FLUSH context. Note that we still arrange for nvdimm_flush() to be called even in the ADR case. We need at least once wmb() fence to push buffered writes in the cpu out to the ADR protected domain. Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11libnvdimm: cycle flush hintsDan Williams
When the NFIT provides multiple flush hint addresses per-dimm it is expressing that the platform is capable of processing multiple flush requests in parallel. There is some fixed cost per flush request, let the cost be shared in parallel on multiple cpus. Since there may not be enough flush hint addresses for each cpu to have one, keep a per-cpu index of the last used hint, hash it with current pid, and assume that access pattern and scheduler randomness will keep the flush-hint usage somewhat staggered across cpus. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()Dan Williams
nvdimm_flush() is a replacement for the x86 'pcommit' instruction. It is an optional write flushing mechanism that an nvdimm bus can provide for the pmem driver to consume. In the case of the NFIT nvdimm-bus-provider nvdimm_flush() is implemented as a series of flush-hint-address [1] writes to each dimm in the interleave set (region) that backs the namespace. The nvdimm_has_flush() routine relies on platform firmware to describe the flushing capabilities of a platform. It uses the heuristic of whether an nvdimm bus provider provides flush address data to return a ternary result: 1: flush addresses defined 0: dimm topology described without flush addresses (assume ADR) -errno: no topology information, unable to determine flush mechanism The pmem driver is expected to take the following actions on this ternary result: 1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown 0: do not set, WC or FUA on the queue, take no further action -errno: warn and then operate as if nvdimm_has_flush() returned '0' The caveat of this heuristic is that it can not distinguish the "dimm does not have flush address" case from the "platform firmware is broken and failed to describe a flush address". Given we are already explicitly trusting the NFIT there's not much more we can do beyond blacklisting broken firmwares if they are ever encountered. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11libnvdimm: keep region data alive over namespace removalDan Williams
nd_region device driver data will be used in the namespace i/o path. Re-order nd_region_remove() to ensure this data stays live across namespace device removal Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11libnvdimm, nfit: move flush hint mapping to region-device driver-dataDan Williams
In preparation for triggering flushes of a DIMM's writes-posted-queue (WPQ) via the pmem driver move mapping of flush hint addresses to the region driver. Since this uses devm_nvdimm_memremap() the flush addresses will remain mapped while any region to which the dimm belongs is active. We need to communicate more information to the nvdimm core to facilitate this mapping, namely each dimm object now carries an array of flush hint address resources. Signed-off-by: Dan Williams <dan.j.williams@intel.com>