Anime Video Encoding Guide for X265 (HEVC) & AAC/Opus (2019)

prvc · on July 12, 2021

>What I’m more excited about is AV1. AV1 is a next gen codec with members from many tech giants (Apple, Google, MS, etc.) to create a royalty free codec. IIRC some of its baseline is based on VP10, which Google scrapped and had its code donated to developing AV1. Unfortunately the reference encoder libaom is in a very early stage and takes forever to encode. SVT-AV1 is Intel’s (and joined by Netflix) scalable implementation. It performs not as well as libaom, but quality is still better than x265 (SSIM), and encode times are way more reasonable. For now, they look promising, and I am excited to see results in a few years. HEVC took 3-4 years before it was more widely accepted by Anime encoders, and took 5 years until 2018 before I began experimenting with it. AV1 began major development in mid-2018 so by that logic we got 2-3 more years to go.

Well, it's 2021 now, would be interesting to see a guide for AV1 encoding in the same vein. As I understand it, both of the leading AV1 encoders are still undergoing frequent changes and improvements, but it still would be nice to have a guide to help one to get a handle on things.

llampx · on July 12, 2021

If I have an AMD Ryzen processor, am I SOL when it comes to AV1 encoding?

DiabloD3 · on July 12, 2021

The reverse, if you had Intel, you'd be SOL.

AMD is a huge AV1 supporter, both with improving speed of software decoding and encoding, but also granting the ability to the Radeon's video ASIC to handle it.

paol · on July 12, 2021

If you have a processor without integrated graphics (more common on AMD than Intel, although Intel also sells some) you don't get hardware accelerated encoding anyway, since that's a feature of the GPU.

IshKebab · on July 12, 2021

Hardware accelerated encoding is usually lower quality anyway so you probably wouldn't use it for this sort of thing.

llampx · on July 12, 2021

I thought AV1 encoding is not hardware-accelerated yet.

elithrar · on July 12, 2021

No current iGPU or dedicated GPU (e.g. NVENC on NVIDIA[1]) support AV1 encoding today - talking about CPU support is odd. Encoders like SVT-AV1 are software-only.

[1]: https://developer.nvidia.com/video-encode-and-decode-gpu-sup...

prvc · on July 12, 2021

Not at all.

prvc · on July 12, 2021

>There is no simple answer to fix these 2 problems due to crf targets. Traditionally in x264 such scene will simply end up in blocking artifacts. x265 chooses to eliminate artifacts at the cost of detail loss. The down side is that even at lower crf targets it is tough to eliminate x265’s tendency to blur. To truly eliminate such effects, you will first need no-sao:no-strong-intra-smoothing:deblock=-1,-1 to make x265 behave more like x264, then raise psy-rd and psy-rdoq accordingly (2 and 5 respectively should do the trick). However, this reintroduces unpleasant artifacts x265 aimed to eliminate in the first place, thus I do not recommend such encode parameters unless encoding crf <16 (in which case file sizes are so big just use x264, why even bother x265?).

There might be something to be said for perceptible degradation: the characteristic macroblock vomit produced by bit-starvation gives a sure signal to the viewer that something went wrong with the video encoding, whereas a smooth looking image with some details omitted can produce a false impression of what the video signal is, with no indication as to whether or not this is taking place (imagine this happening, for example in security camera footage).

ZeroGravitas · on July 12, 2021

That quote sounds like someone trying to justify something to themselves.

Otherwise it's a very long-winded way of saying, use the x265 defaults, which doesn't really need said at all.

Presumably it's a leftover from the x264 community where blur (a valid compression technique that can be overused) was demonized probably beyond what was supported by evidence so now they need to rein that bias in when blur is actually the best option available.

thrdbndndn · on July 12, 2021

I don't mind blur in anime at all but it's very noticable (without using pause or magnifying glass) in live action footages, even when using low CRF. I have to turn off `SAO` and `strong-intra-smoothing` when using x265 for this reason.

Wowfunhappy · on July 12, 2021

> Bonus conversation: Due to x264 being much less complex, presets can pretty much maintain only a slight loss in quality as you encode faster. This gives an illusion that slower presets are slower due to spending more time compressing. The more correct way to see this is that slower presets are slower due to doing more motion calculations and finding the best scheme that best describes the frame, which in x264 just so happens benefits compression too. However, x265 is waaay more complex with motion algorithms, meaning that accurately describing motion actually increases bitrate.

This is totally confusing me. If the scheme that best describes the frame actually uses more data, and the user is targeting a lower bitrate/quality, shouldn't the encoder realize that the less accurate description is a more efficient choice, and select that?

brigade · on July 12, 2021

The author is pretty confused. x264’s slower presets enable searching more points in ME and a more accurate RDO (among other things), just like x265’s. But in all cases for any modern encoder, all mode decisions are done baed on the estimated bitcost relative to distortion, so the slower presets are just evaluating more options. (kinda)

That said, it is the case across all codecs that more accurate motion vector fields (e.g. using optical flow) are not generally optimal. A simple way of reasoning about that is that one block reusing the same motion vector as a neighboring block is much cheaper than encoding a marginal difference, which often more expensive than the bits saved in the residual.

Anyway, CRF was never designed to map to the same quality across different encoder settings, which having actually read the article, is what I think they’re actually complaining about.

lmm · on July 12, 2021

So what is designed for that? Generally as an encoder you want to target a certain (perceptual) quality level and when testing different settings you're aiming to hold that constant and see whether you achieve a lower bitrate. There should be a way to do that.

brigade · on July 12, 2021

You plot two RD curves with whatever quality metric you care about, then calculate the distance between them. Also known as the Bjontegaard metric.

slhck · on July 12, 2021

And the issue with that is that they're using PSNR and SSIM as quality metrics, which are useful for images, but haven't been designed for video, so they cannot take into account temporal factors and how motion masks distortion, etc.

I think for AV1 they're using VMAF too, which is an actual video quality metric (https://github.com/Netflix/vmaf).

brigade · on July 12, 2021

Sure you can use VMAF as the quality metric for BD-rate, that’s what BD-VMAF is

lmm · on July 13, 2021

But even after choosing a method, I want to be able to set target quality in a meaningful way. Having a parameter that I can set to specify quality on a method-agnostic scale seems like a basic usability feature.

astrange · on July 12, 2021

It only targets a bitrate if you tell it to. Constant quality/crf mode doesn't care about bitrate, it just applies a quality level to whatever makes it to intra prediction (which is the stage after inter prediction, where all the motion compensation is).

prvc · on July 12, 2021

Lower crf will lead to higher bitrates, everything else being equal. It makes no sense to compare this figure between different presets.

goblin89 · on July 12, 2021

> It was a re-encode at crf=22.5, with zero encoder tuning and preset at fast. Needless to say, it was blocking and artifacts galore. The icing on the cake? The audio was FLAC “to preserve audio quality”… I’m pretty sure there are more people in this world with 1080p screens than high-end headphones lol.

Not to say the author is wrong, as their example is very drastic, but I don’t think this matter is so black-and-white.

1) Audio quality matters more than the quality of the video in terms of viewing satisfaction (this one is my subjective opinion).

2) The better your audio system, the more important the source material quality.

3) Lossy compression is a one-way road. Did the distributor recode from lossless, or applied their own lossy encoding on top of an already lossy stream because they couldn’t be bothered to reconfigure their tools? You don’t even have the data to tell how messed up your sound is. Meanwhile, with lossless there are straightforward ways to tell whether it’s “true lossless” (i.e., wasn’t up-coded from a lossy format).

4) Distributing truly lossless video is currently just plain impossible (from obtaining the requisite master files to dealing with such volumes of data), so we are choosing between “different degrees of evil” out of necessity. Distributing lossless audio ultimately is feasible, at the expense of a couple hundred megabytes.

In the end, if we’re speaking about a work of value (e.g., a good classic film), I wouldn’t mind spending extra megabytes on preserving the original audio.

MrRadar · on July 12, 2021

As the article points out, if the source is already lossy (which is (almost) all broadcast, DVD, and web sources) then you should just copy that bitstream into the resulting video (transcoding lossy to FLAC gives you the same quality at higher bitrate and is the worst of both worlds). The only common source you would get true lossless audio from is BluRay discs and even in that case the recommended audio encoding settings the author gives should be well into transparency for all but the most golden-eared listeners, and they would likely only be able to detect the difference in an AB comparison with the original source.

goblin89 · on July 12, 2021

Agreed that up-coding is bad. Still think preserving lossless, if it is available in the first place, is worth it, and a couple hundred megabytes probably don’t matter to someone who chose a quality download rather than streaming.

The point that “the entirety of one’s experience is fully available to one’s conscious mind, so if one can’t tell the difference then it doesn’t matter” can be a philosophical question on which we’ll disagree.

thrdbndndn · on July 12, 2021

DVDs use PCM (LPCM?) which is lossless.

MrRadar · on July 12, 2021

DVDs can use PCM, however every commercial release I've seen uses Dolby Digital/AC3/A52 or DTS. (Apparently PAL DVDs may also use MP2 but as an American I've never personally encountered that on a commercially-released disc.)

thrdbndndn · on July 12, 2021

My bad, shouldn't generalize. It probably depends on region and genre then. I collect Japanese music DVDs (music videos, lives), 99% of them use stereo PCM.

MrRadar · on July 12, 2021

Yeah, I mostly collect(ed) films and TV. I can imagine that the people authoring music DVDs would decide the bitrate tradeoff would be worth it to get the better quality when the focus of the disc is on the sound.

saiya-jin · on July 12, 2021

Well compressed audio is indistinguishable from uncompressed stream to 99% of the population using 99% of the listening devices (even though those keep improving on average).

Nobody is arguing 96kbit mono should be enough for everyone obviously.

goblin89 · on July 12, 2021

The point is that once you receive lossy audio, you can’t even verify how well it was compressed. Can you trust that it was encoded from a lossless source and not re-encoded a hundred times? Do you consider “it’s all good, unless you actually notice bad sound quality with naked ear in the middle of the film” is a sound principle from viewing experience standpoint?

(By the way, your numbers look like you pulled them out of nowhere, but if we are doing this then let me tell you: 99% of population will just stream. This discussion is already moot to everyone but the most picky viewers.)

prvc · on July 12, 2021

>The more threads you add into a pool, the more encode overhead you will experience, since every row of encode requires the upper right CTU block to complete before it can proceed.

Nowadays, we don't have to worry about that, since we can use scene-based encoding instead. [1][2] In addition to allowing for full use of all cores, without any loss of encoding or processing efficiency, different encoding settings may be chosen for each scene, increasing potential efficiency.

[1] https://netflixtechblog.com/optimized-shot-based-encodes-for...

[2] https://github.com/master-of-zen/Av1an

ww520 · on July 12, 2021

Just a note. H.265 (HEVC) is laden with patents. That's a reason Google didn't add H.265 support in Chrome.

KMnO4 · on July 12, 2021

I’ve wondered about this a lot. Anime (and many hand drawn animations in general) is typical fairly predictable. Objects, expressions, and even motions are reused frequently.

Shapes are often basic enough that a “vector” graphic could adequately represent their structure (in far fewer bytes).

If there were a way to structure anime into their core components, could we compress the video more like LZMA?

thrdbndndn · on July 12, 2021

You're describing flash.

As other said, they have to be rasterized inevitably for various reasons.

(Hand drawn animations aren't really vectors to begin with; computer-aided ones probably are better examples.)

Retr0id · on July 12, 2021

The JPEG XL bitstream format contains a "spline" primitive. Unfortunately, these don't get encoded particularly efficiently, and there aren't any encoders that can take advantage of it yet. JPEG XL can do animations, but it's not supposed to replace video codecs. It would be interesting to see similar features put into an actual video codec.

dole · on July 12, 2021

I recall some recent codec or project that addressed anime compression, something similar to the following:

https://github.com/nikitaa30/Image-Compression-using-ML-and-...

Wowfunhappy · on July 12, 2021

I think this would be like trying to turn a png back into an svg. The pixels on your screen might be exactly the same, but once it's rasterized, you can't really go backwards.

Image trace exists but doesn't work well.

pizza · on July 12, 2021

There are a number of models trying to basically do just that https://paperswithcode.com/task/vector-graphics

0-_-0 · on July 12, 2021

Now that I think about it, once a vector representation is extracted, shouldn't that make frame interpolation in anime easier as well?

idoubtit · on July 12, 2021

My experience is that VP9 is often a better choice than HEVC. The article mostly compared x265 to x264 because vp9 was not endorsed by Apple. That is no more the case, and I've found VP9 anime videos to have better quality at similar bitrate.

Recent VPx encoders are also better at multi-threading. For instance with ffmpeg: `-c:v vp9 -row-mt 1 -crf 26`

ekianjo · on July 12, 2021

How is Vp9 support in browsers?

elithrar · on July 12, 2021

Safari has been the hold-out, but has experimental support for VP9 (Settings > Safari > Advanced > Experiments > VP9) as of ~iOS 14.

https://caniuse.com/#search=vp9

PaulHoule · on July 12, 2021

Looking at a number of device manuals (my car, my Garmin watch, ...) many of them claim to only support AAC files "encoded by the Apple encoder".

I think that's because AAC supports a wide range of coding tools on paper and off-brand encoders might use them, but everybody supports the ones the Apple encoder uses.

backspace_ · on July 12, 2021

So basically, how can we use free and open technology to make the smallest file size while trying to retain as much quality as possible?

This also doesnt touch on what the anime is being sourced from. Furthermore, it only touches on encoder usage but not having to process the source such as using avisynth in some cases.

judge2020 · on July 12, 2021

> free and open technology

Doesn't using h265 invalidate this? I know it's seem to have won in current-gen consumer hardware support, but there's hope that AV1 grows.

NewJazz · on July 12, 2021

A lot of hardware that I've seen has both h265 and VP9. So you often do have the option of using the codecs that are royalty free.

hansel_der · on July 12, 2021

6-gen mobile intel running ubuntu-lts can decode h264, h265 and av1 with vaapi but not vp9/vp10.

wwalexander · on July 12, 2021

x265 [1] is licensed under the GPLv2.

[1] https://bitbucket.org/multicoreware/x265_git/src/master/

thrdbndndn · on July 12, 2021

Yeah from the my experience the "hard" part is always about finding the right (albeit subjective) filtering in AVS/VapourSynth depending on the source.

Most of major release and fansub teams don't really vary their encoding parameters much.

Still a good article though.

Jiro · on July 12, 2021

The format is patented. It absolutely is not free and open technology.

The anime encoders are copying things illegally anyway, so they're not going to care about violating the patent licenses.

astrange · on July 12, 2021

These tools are used to produce official BDs as well.

strogonoff · on July 12, 2021

If one is listing expensive IEMs and amps/DACs, IMO one would be amiss not to mention Shure KSE1200.

donatj · on July 12, 2021

Oh interesting. I reference this article all the time.

ttoinou · on July 12, 2021

What is "cough" in italics in this article ?

moreati · on July 12, 2021

The author is choosing not to name names, i.e. they aren't pointing a finger at the specific person/site they're criticizing.

"cough" or mumble is a slightly humorous shorthand/convention for this. It suggests an audio transcript mistake, that conveniently doesn't reveal the guilty party.

ttoinou · on July 12, 2021

Thanks ! I thought about this but then I read "Providing remux/RAWs for cough." and it didn't make sense to me anymore

FeepingCreature · on July 12, 2021

Pirate sites.

Bayart · on July 12, 2021

The sound of coughing.

aaron695 · on July 12, 2021

Is this compatible with Warez standards? https://en.wikipedia.org/wiki/Standard_(warez)

I think it would be good for the scene to move to H.265.

It's been a while since they moved to H.264 (2012) and that made a big difference - https://torrentfreak.com/bittorrent-pirates-go-nuts-after-tv...

The sooner someone invents a way to get standard committees to work at a faster pace the sooner we'll all get to Mars and solve world poverty.

webmobdev · on July 12, 2021

This is an interesting insight to piracy culture - I didn't know they had some kind of unofficial understanding between them on tech standards. Kind of funny!

(I think the biggest hurdle for HEVC is that it isn't available on a lot of devices. Sure, if you want to watch a video on a computer, HEVC will work. But if you want to watch the same video on your TV, chances are that it won't work. H.264 / AVC however is now near universal on most LED TVs sold).