linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2018-09-04	crypto: arm64/crct10dif - implement non-Crypto Extensions alternative	Ard Biesheuvel
	The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient. So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-04	crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL version	Ard Biesheuvel
	Reorganize the CRC-T10DIF asm routine so we can easily instantiate an alternative version based on 8x8 polynomial multiplication in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04	crypto: arm64/crct10dif - add non-SIMD generic fallback	Ard Biesheuvel
	The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07	crypto: arm64/crct10dif - port x86 SSE implementation to arm64	Ard Biesheuvel
	This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>