Introduction to EEV-0.6

The MPAI EEV group has established another milestone regarding the most powerful fully neural network-based video codec. The objective and vision of MPAI EEV is to make a bridge between the “traditional” video coding issues and those of neural video coding. The latest version of the reference model of MPAI EEV, EEV-0.6, has been completed research and finalized. This model reflects the state-of-the-art (SOTA) neural video coding technology in the senses of bi-directional motion compensation prediction-based video compression.

The ultra-high-resolution videos contain large quantities of fine-grained textural details, complicated motion dynamics and significant signal intensity variation in both local and global content modelling. In the conventional frameworks, coding for ultra-high resolution (UHD) videos has long been a challenging task due to plenty of reasons, such as large volumes of computation, the difficulty of modelling the diverse content with a unified compact representation, the limitation of the model capacity that covers a wide range of motion characteristics, etc. In the conventional video coding, the coding tools achieved limited coding gain in the terms of rate-distortion efficacy. As such, the previous solutions tend to enjoy a simple combination of different coding methods with different functionality to cover the different difficulty in UHD video coding.

In prior EEV models, the low-delay configuration based coding methods have been studied. The significant coding gain could be obtained and the EEV models has outperformed the conventional video coding standard.

Exploiting bi-directional context prediction has long been recognized as a key direction for improving compression efficiency in neural video coding. Ever since EEV-0.5, the major attention has been paid to B-frame based end-to-end video coding. However, existing neural B-frame codecs still exhibit limited performance gains, particularly in high-resolution videos with large motion, where optical flow estimation becomes unreliable and balanced prediction fusion introduces distortions. To address these challenges, in EEV-0.6, we present the first High-Resolution bi-directional neural video coding method, termed as HR-NVC, which non-uniformly integrates confidence-guided predictive cues from both temporal directions to achieve more reliable and efficient compression. Specifically, EEV-0.6 designs Spatio-Temporal Anchored Motion Estimation, which introduces virtual anchor frames and low-resolution priors to significantly improve estimation robustness under large displacements. Followed by a novel Hierarchical Motion Representation that converges multi-scale motion with temporal references, enabling compact and adaptive modeling of motion reliability across resolutions, EEV-0.6 further presents a Bi-Contextual Asymmetric Harmonization module that performs confidence-guided fusion of bidirectional references, effectively suppressing unreliable contexts and restoring structural consistency near occlusion and scene transition regions.

By introducing the concept of trustworthy factor for the enhanced bi-directional predictors generation, the EEV-0.6 designed a self-contained robust compression architecture that supports scaled-hierarchical B-frame based inter prediction and contextual coding. Moreover, the motion representation scheme has been extremely studied with high-efficiency representation fashion. Such design not only enables flexible and versatile motion characteristic adaptation but also supports fine-grained quality adjustment for the bi-directional prediction accuracy.

Regarding rate-distortion efficiency. EEV-0.6 realizes SOTA compression performance and outperforms related neural model as well as the conventional video coding standards such as H.266/VVC and H.265/HEVC, proposing a solid milestone for the research and study for neural video coding and its associated standardization activities. Notably, our model is the first end-to-end-optimized video codec evaluated on 4K-resolution videos, establishing a new benchmark for higher-resolution NVC and achieving state-of-the-art performance among neural B-frame codecs. The underlying enabling technology behind EEV-0.6 has been accept by CVPR conference this year with highlight presentation and will be made public available in the future.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Notice