Artificial intelligence applied to video encoding: present and future benefits

The artificial intelligence ( AI ) is increasingly the protagonist also in audio-video market . The areas in which AI is already used with success are numerous: a concrete example can be found in the image processing carried out by all the most sophisticated televisions by now. The advantages are concrete as we explained in the in-depth analysis on the use of AI in Samsung 8K TVs.

There are also other use scenarios for artificial intelligence: one of these is encoding, that is the process of encoding the videos that are then conveyed to the end user via television broadcasts or streaming. To explain the benefits of AI applied to the latest technologies was Thierry Fautier , VP of Strategy at Harmonic, a company specializing in solutions for companies that produce, process and distribute video content for television and the internet.

The adoption of AI encondig will take place in two phases, the first of which has already been implemented by various companies including Harmonic itself. We speak specifically of machine learning combined with codecs such as AVC , HEVC , AV1 and AVS3 . The second phase will instead focus on next generation solutions such as VVC and AV2 .

Following are all the main issues dealt with, from the current situation to future developments:



Harmonic already offers an AI-assisted coding called EyeQ Content-Aware Encoding (CAE). Underlying this solution are algorithms that use machine learning to adapt the process to the human visual system. Translating it into simpler words it can be said that optimization focuses on the points where the viewer’s gaze focuses ( is what Sony also does with the Cognitive Processor XR): EyeQ CAE analyzes the video quality in real time and exploits all the available capabilities only where and when it is essential to preserve the maximum information.

Fautier explains that they already exist beyond 100 implementations of CAE encoding in the way with AVC and HEVC, almost all designed for OTT (Over-The-Top, ie those who provide services via the internet) platforms. Harmonic has accumulated considerable experience in the sector thanks to various tests conducted over the years, such as the experiments carried out during the edition 2019 of Roland Garros, with some matches broadcast (not freely for anyone) at 8K resolution with HEVC. Bandwidth savings can reach 40 – 50% with a quality completely comparable to encoding without AI .


EyeQ is based on current encoders and does not require additional computing capacity. However, it is not the only possible approach: other technologies based on AI instead require greater resources that can be divided in two ways: the first consists simply of a greater computing capacity for GPU / CPU while the second relies on the so-called Convoluted Neural Networks ( CNN : in Italian convolutional neural networks).

The first method therefore requires an intervention by the companies that carry out the coding: it is a question of updating the hardware to obtain the necessary computing capacity. The use of CNNs, currently being studied by groups such as MPEG (Moving Picture Experts Group), distributes the workload in a different way by assigning it more to the “client” side to reduce bandwidth consumption. Balancing these aspects is one of the objectives for all researchers: at the moment there is still no approach that can be considered definitive.

Fautier then specified that the use of AI, in the form of machine learning or deep learning (here we explain the difference), always requires important resources and it is therefore essential to consider every aspect when executing the coding. To better understand this step, a concrete example is given: Netflix (like other services) uses AI to obtain the best possible optimization in terms of resolution linked to the bitrate and thus arrive at an encoding that takes into account all the combinations of the most important.

The result is very accurate but it cannot be applied indiscriminately to every type of streaming: it is not suitable for direct because it would not be able to support the load working in real time and this is one of the reasons why streaming movies and TV series should never be compared to sports and other live events.


Fautier also indicated three main areas on which AI-assisted encoding is targeting:

  1. encoding with dynamic resolution
  2. encoding with dynamic frame rate
  3. stratification

The coding with res dynamic solution, named Dynamic Resolution Encoding ( DRE ), is an extension of the system used today by streaming services. Anyone who has used them knows that there are various quality profiles associated with certain bandwidth requests: the more efficient the connection, the more you level up reaching the maximum available, i.e. higher resolution and / or greater number of frames (usually in sporting events). The state of the art achieved over the years is normally referred to as “per-title encoding” and is adopted by Netflix and other providers.

Encoding is carried out by balancing the storage and bandwidth requirements in order to optimize all profiles (comparing resolution and bitrate). The contents are analyzed scene by scene and the encoding is carried out at all supported resolutions: only at the end of the process are the results compared to determine which is the aforementioned optimal balance for each of the profiles. The DRE technology is able to perform all these operations in a single step and is therefore less expensive as regards the resources required, enough to be also suitable for live events .

The encoding with dynamic frame rate allows you to perform the encoding with the number of frames necessary for a given content. For basically static images they may be sufficient 30 frames or even lower values ​​while for sports it is necessary to go further, generally matching the encoding to the frame rate with which the shots are made. The goal of dynamic frame rate is to reduce the computational capacity required for encoding and is a technique that researchers have been studying for years without success. Thanks to artificial intelligence, which can analyze the source in real time, concrete results have already been seen, according to Fautier.

We conclude with the stratification, a very interesting and ingenious system that has already seen some implementations. What is it about? This approach involves a coding carried out on several layers : the layer of base can be for example at Ultra HD resolution with an additional level which contains improvements, such as extra details for 8K. The two layers can be transmitted using the same technology or using different channels . To save bandwidth, you could broadcast an Ultra HD TV broadcast with the additional layer sent over a network connection. On an Ultra HD TV only the main layer would be played while on an 8K it would use the additional layer data to reconstruct the signal at maximum definition.

L Layered encoding can already be implemented today with scalable HEVC or via scalable or LCEVC-based VVC. In the case of HEVC, adopted for television broadcasting in the ATSC (Advanced Television Systems Committee) 3.0 standard in North America, the base layer in HD can be used for mobile devices and the additional layer for Ultra HD TVs. Another example is Samsung’s ScaleNet technology , an application of neural networks that relies on pre-processing and post. In the encoding phase (the pre) the AI ​​applies a downscaling to the contents in 8K in order to distribute them in the form of videos in Ultra HD. Metadata is inserted into the video stream (the additional layer) that TVs can use to reconstruct the original 8K resolution signal (in post).

All the solutions reported are naturally subject to changes and rapid evolutions: the applications of artificial intelligence in the field of encoding are still under development and potentially subject to standardization at the hands of the consortia operating in these fields. Surely the customs clearance of streaming and higher resolution television broadcasts, i.e. Ultra HD and 8K, will greatly benefit: having useful tools to save as much bandwidth as possible is the only way to ensure the support of broadcasters and operators of the services.

Back to top button