H.265 is based on incremental gains over time. The paradigms have largely remained the same. What are the odds of deep learning providing huge performance gains? Here is the main metric:
- Perceptual quality remains the same at 2x improvement in bit-rate.
- Assume unlimited compute power at encoder.
- No need for speed.
Of course, I understand this is a really ill-posed problem because there are so many factors involved. But if it can be done, what would give the performance gain? Is it the following?
- Larger CTUs instead of the 64x64 one in H.265. Perhaps even go towards object-based coding.
- Better block matching algorithms that map longer dependencies.
- Perceptual loss for block matching instead of MAD or MSE.
- Etc...
Answer
[EDIT: addition of a March 2017 preprint]
Deep learning already has many applications in video, like enhancement (Deep Convolutional Neural Network for Decompressed Video Enhancement) or semantic analysis.
Recently, there have been some announcements related to video compression, for instance:
Traditional image and video compression algorithms rely on hand-crafted encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the data being compressed. Here we describe the concept of generative compression, the compression of data using generative models, and show its potential to produce more accurate and visually pleasing reconstructions at much deeper compression levels for both image and video data. We also demonstrate that generative compression is orders-of-magnitude more resilient to bit error rates (e.g. from noisy wireless channels) than traditional variable-length entropy coding schemes.
How important is deep learning here, and what performances are obtained is not clear to me yet.
No comments:
Post a Comment