Alfonso Maruccia

Posts: 1,407   +423
Staff
In context: Advanced vector extensions are a type of "single instruction, multiple data" extension to the x86 instruction set architecture, implemented by Intel and AMD in modern CPUs. These instructions can significantly enhance parallel processing workloads, especially when used with 512-bit registers and other advanced features available in the AVX-512 instruction set.

The FFmpeg team recently highlighted how AVX-512 instructions can deliver a significant performance boost in video processing workloads. According to a slide presented by one of the developers, optimized "handwritten assembly" leveraging these SIMD instructions can accelerate video decoding routines by three to 94 times.

While no specifics were provided about the CPU or system used for benchmarking, AVX-512 technology first appeared in Intel's Xeon Phi x200 (Knights Landing) CPU series in 2016. The substantial performance gains stem from the combination of AVX-512 vector instructions with highly optimized assembly code, though AVX instructions were originally designed to enhance SIMD parallel processing from the outset.

FFmpeg is a free, open-source software package that offers a comprehensive suite of libraries and tools for handling audio and video streams – a true Swiss army knife of multimedia, used by popular media players like VLC and major corporations including YouTube. The core FFmpeg team oversees the project, while a community of volunteers contributes code and patches.

FFmpeg currently relies on assembly language for about eight percent of its codebase, the developers said, leaving plenty of room for performance improvements. Assembly is a low-level language that few programmers specialize in today, especially since much of the software industry now prioritizes high-level, accessible languages like Python.

Still, skilled developers are always eager to maximize performance on the latest hardware. FFmpeg includes custom "handwritten" decoding routines for both x86 and ARM processors, even as some in the software industry wish for AVX-512 to die "a painful death."

Recently, Intel introduced AVX10, a reimagined ISA that standardizes AVX-512 instructions across all x86 CPU architectures and core types. However, Intel made waves when it disabled AVX-512 support at the firmware level on 12th-gen Core processors and later models, effectively removing the SIMD ISA from its consumer chips.

Permalink to story:

 
Illustrates an opportunity for AI-enhanced code, or specifically, compilers. In my experience proficient developers do not have a lot of trouble writing working code. But as illustrated here, it is common for "finished" code to leave a lot of potential performance on the table, both for individual workers/threads to not be as efficient as they could be, and commonly also for code to not take advantage of multiple cores as well as it could (and also, have a lot of security holes.) For all 3, that's because it takes extra time to do it right and employers often don't want to spend that money.

Many systems have extra cores that are idle most of the time - software tools that could help developers better use them automatically could have a big impact on the ecosystem.
 
Would love to test it on my threadripper… I’d love my rendering to take 3-94 less time 😆
 
We dont need better hardware, we need better programmers/code/programs

You need AVX512 for this to work, so yeah, most will need new hardware... Zen 5 has it
 
I wonder if this will be incorporated into something like Hnndbrake as I know it uses FFmpeg libraries for some of its conversion routines. I know I'm not conversant with using FFmpeg by itself. On the other hand I have been using Handbrake for over a decade to shrink ripped DVDs and Blu Rays down to manageable sizes to put on my media servers. Would love to have even a doubling of conversion speeds (let alone 94) to re-encode all my videos without taking massive amounts of time.
 
You need AVX512 for this to work, so yeah, most will need new hardware... Zen 5 has it

So does Zen 4, though Zen 5 features a large increase in AVX512 performance.

What doesn't have it? Intel mainstream processors starting with the 12th generation. (Xeon workstation and server processors continue to support AVX512.) No Intel CPU with a combination of P and E cores offers AVX512 because the E cores don't implement it. (The P cores do, except that it's turned off.) The problem is that a thread that uses AVX512 would have to be restricted to run only on P cores, and there is no software build chain or OS support for such a restriction.
 

Similar threads