Age | Commit message (Collapse) | Author |
|
The SSE version has been no different than the mmx one since commit a41bf09d
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2':
dsputil: Split audio operations off into a separate context
Conflicts:
configure
libavcodec/takdec.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
libavcodec/x86/dsputil_x86.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
See commits
649c00c96d7044aed46d70623e47d7434318e6b9
5fecfb7d58a12baf326e99f2d071060f2638d93c
73b02e24604961e49a63ca34203d8f6c56612117
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This will pick the "best" simple idct compatible idct
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Fixes compilation with NASM x86_64
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Also replace INLINE_<opt> with EXTERNAL_<opt> that were wrongly
changed by commit 2b05db4f8102148d013755ac2a7e47f6d79ff7ca
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9':
dsputil: Split clear_block*/fill_block* off into a separate context
Conflicts:
configure
libavcodec/asvdec.c
libavcodec/dnxhddec.c
libavcodec/dnxhdenc.c
libavcodec/dsputil.h
libavcodec/eamad.c
libavcodec/intrax8.c
libavcodec/mjpegdec.c
libavcodec/ppc/dsputil_ppc.c
libavcodec/vc1dec.c
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Fixes track ticket 3717.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
They used an extra, undeclared register. Fixes a crash in
fate-vsynth3-ffvhuff444p16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '570d4b21863b6254d6bbca9c528bede471bb4478':
x86: h264: Don't keep data in the redzone across function calls on 64 bit unix
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
We know that the called function (ff_chroma_inter_body_mmxext)
doesn't touch the redzone, and thus will be kept intact - thus,
this doesn't fix any bug per se.
However, valgrind's memcheck tool intentionally assumes that the
redzone is clobbered on every function call and function return
(see a long comment in valgrind/memcheck/mc_main.c). This avoids
false positives in that tool, at the cost of an extra stack pointer
adjustment.
The other alternative would be a valgrind suppression for this issue,
but that's an extra burden for everybody that wants to run libavcodec
within valgrind.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
There's an SSE2 version already, and technically the SSE version
on x86_64 was wrong (using pshufd and pshuflw, SSE2 instructions).
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz
1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips
3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips
1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips
2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips
3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips
1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This reduces differences with the fork
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
APE is not the sole codec using scalarproduct_and_madd_int16.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Only the xy2 functions aren't.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Several bugfixes by: Christophe Gisquet <christophe.gisquet@gmail.com>
See: [FFmpeg-devel] [WIP] [PATCH 4/4] x86: dsputilenc: convert hf_noise*_mmx to yasm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
The immediate value may be 0.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
The macro was not using the parameter but unconditionally using sse4.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
The check is meant for k8 CPUs. sad16_sse2 is ~20% faster than sad16_mmxext on k10.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
It's needed for ff_svq1enc_init_x86() even if simd functions are disabled.
Alternatively, svq1enc_init.c could be made and the relevant code moved there.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Remove duplicate prototypes and fix int -> intptr_t in another
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
When there are 2 functions that are <= SSE2, only one is needed for x86_64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
C MMX SSE2
Cycles: 3092 1053 578
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
add_hfyu_left_pred()
This avoids potential issues with the high 32bits being random in x86-64 asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
From 5010c to 4566 on lagarith YUY2.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
See: 368f50359eb328b0b9d67451f56fda20b3255f9a
See: 44eb49512888143905860af2de2932ab002cdbf7, and many others
See:
similarity index 83%
copy from libavcodec/x86/dsputil_init.c
copy to libavcodec/x86/qpeldsp_init.c
index ebbf97f..8f296a1 100644
--- a/libavcodec/x86/dsputil_init.c
+++ b/libavcodec/x86/qpeldsp_init.c
@@ -1,6 +1,5 @@
/*
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
+ * quarterpel DSP functions
*
* This file is part of FFmpeg.
*
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '368f50359eb328b0b9d67451f56fda20b3255f9a':
dsputil: Split off quarterpel bits into their own context
Conflicts:
configure
libavcodec/dsputil.c
libavcodec/h263dec.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo_enc.c
libavcodec/vc1dec.c
libavcodec/vc1dsp.c
libavcodec/x86/dsputil_init.c
libavcodec/x86/qpeldsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3':
dsputil: Move APE-specific bits into apedsp
Conflicts:
libavcodec/arm/int_neon.S
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
See: 5900637219ccccdd39ddafa4e7181da20b8e1f1b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '65d5d5865845f057cc6530a8d0f34db952d9009c':
dsputil: Move SVQ1 encoding specific bits into svq1enc
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
XOP support was added in Yasm 1.0.0 and Nasm 2.06, and we still
support older versions.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
C MMX SSE2
Cycles: 2972 587 302
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
SSE2: 137 cycles
XOP: 87 cycles
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|