Age | Commit message (Collapse) | Author |
|
None of them are specific to the YASM assembler.
(Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1)
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
benchmark:
sse2 10.670s
avx 8.763s
fma3 8.380s
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
|
|
separate dsp.resample to dsp.resample_common and dsp.resample_linear
and choose to call faster resample_common even when linear_interp=on
when c->frac and c->dst_incr_mod are both zero
speed up resampling when exact_rational and linear_interp are both
enabled because exact_rational force c->frac and c->dst_incr_mod to
be zero when soft compensation does not happen
benchmark on exact_rational=on:linear_interp=on
old new
real 8.432s 5.097s
user 7.679s 4.989s
sys 0.125s 0.107s
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
|
|
phase_shift and phase_mask is removed
generally exact_rational=on is faster than exact_rational=off
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
|
|
give high quality resampling
as good as with linear_interp=on
as fast as without linear_interp=on
tested visually with ffplay
ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000, showcqt=gamma=5"
ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000:linear_interp=on, showcqt=gamma=5"
ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000:exact_rational=on, showcqt=gamma=5"
slightly speed improvement
for fair comparison with -cpuflags 0
audio.wav is ~ 1 hour 44100 stereo 16bit wav file
ffmpeg -i audio.wav -af aresample=osr=48000 -f null -
old new
real 13.498s 13.121s
user 13.364s 12.987s
sys 0.131s 0.129s
linear_interp=on
old new
real 23.035s 23.050s
user 22.907s 22.917s
sys 0.119s 0.125s
exact_rational=on
real 12.418s
user 12.298s
sys 0.114s
possibility to decrease memory usage if soft compensation is ignored
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
|
|
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
int32/float only
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Until a proper fix is committed.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Also rename resample_x86_dsp.c to resample_init.c
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
The swresample_ prefix is not for internal functions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
312531 -> 311528 dezicycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm)
cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/
sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to
32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to
38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm
5.1 and gcc 4.8/9).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Should fix compilation failures with MSVC and any other compiler
without inline asm support.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
DSP bits of swri_resample go into their own mini-DSP functions; DSP
init goes from a per-call branch in multiple_resample to a proper
DSP init routine; x86 bits go into x86/; swri_resample() moves out of
resample_template.c into resample.c because it's independent of DSP
code or sample type; multiple_resample() is simplified.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Might fix fate-swr on ICL
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
About two times faster
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
About three times faster
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
At least two times faster than the C version.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
symbol reference in inline asm are not supported.
This is part of the patch-set for intel C inline asm on windows support
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
pshuf+paddd is slightly faster than phaddd.
The real gain is in pre-ssse3 processors like AMD K8 and K10, which get
a big boost in performance compared to the mmxext version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
These are not supported by all compilers (gcc 2.95 but also older SPARC
compilers, see gcc bug #33304 for example), and there is no real need for them.
One use of this feature remains in libavdevice/v4l2.c which can't be
replaced quite as easily.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Might fix Ticket1874
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
x86: include x86inc.asm in x86util.asm
cng: Reindent some incorrectly indented lines
cngdec: Allow flushing the decoder
cngdec: Make the dbov variable have the right unit
cngdec: Fix the memset size to cover the full array
cngdec: Update the LPC coefficients after averaging the reflection coefficients
configure: fix print_config() with broke awks
Conflicts:
libavcodec/x86/ac3dsp.asm
libavcodec/x86/dct32.asm
libavcodec/x86/deinterlace.asm
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputilenc.asm
libavcodec/x86/fft.asm
libavcodec/x86/fmtconvert.asm
libavcodec/x86/h264_chromamc.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264_idct.asm
libavcodec/x86/h264_idct_10bit.asm
libavcodec/x86/h264_intrapred.asm
libavcodec/x86/h264_intrapred_10bit.asm
libavcodec/x86/h264_weight.asm
libavcodec/x86/vc1dsp.asm
libavcodec/x86/vp3dsp.asm
libavcodec/x86/vp56dsp.asm
libavcodec/x86/vp8dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|