>What I’m more excited about is AV1. AV1 is a next gen codec with members from many tech giants (Apple, Google, MS, etc.) to create a royalty free codec. IIRC some of its baseline is based on VP10, which Google scrapped and had its code donated to developing AV1. Unfortunately the reference encoder libaom is in a very early stage and takes forever to encode. SVT-AV1 is Intel’s (and joined by Netflix) scalable implementation. It performs not as well as libaom, but quality is still better than x265 (SSIM), and encode times are way more reasonable. For now, they look promising, and I am excited to see results in a few years. HEVC took 3-4 years before it was more widely accepted by Anime encoders, and took 5 years until 2018 before I began experimenting with it. AV1 began major development in mid-2018 so by that logic we got 2-3 more years to go.
Well, it's 2021 now, would be interesting to see a guide for AV1 encoding in the same vein. As I understand it, both of the leading AV1 encoders are still undergoing frequent changes and improvements, but it still would be nice to have a guide to help one to get a handle on things.
AMD is a huge AV1 supporter, both with improving speed of software decoding and encoding, but also granting the ability to the Radeon's video ASIC to handle it.
If you have a processor without integrated graphics (more common on AMD than Intel, although Intel also sells some) you don't get hardware accelerated encoding anyway, since that's a feature of the GPU.
No current iGPU or dedicated GPU (e.g. NVENC on NVIDIA[1]) support AV1 encoding today - talking about CPU support is odd. Encoders like SVT-AV1 are software-only.
>There is no simple answer to fix these 2 problems due to crf targets. Traditionally in x264 such scene will simply end up in blocking artifacts. x265 chooses to eliminate artifacts at the cost of detail loss. The down side is that even at lower crf targets it is tough to eliminate x265’s tendency to blur. To truly eliminate such effects, you will first need no-sao:no-strong-intra-smoothing:deblock=-1,-1 to make x265 behave more like x264, then raise psy-rd and psy-rdoq accordingly (2 and 5 respectively should do the trick). However, this reintroduces unpleasant artifacts x265 aimed to eliminate in the first place, thus I do not recommend such encode parameters unless encoding crf <16 (in which case file sizes are so big just use x264, why even bother x265?).
There might be something to be said for perceptible degradation: the characteristic macroblock vomit produced by bit-starvation gives a sure signal to the viewer that something went wrong with the video encoding, whereas a smooth looking image with some details omitted can produce a false impression of what the video signal is, with no indication as to whether or not this is taking place (imagine this happening, for example in security camera footage).
That quote sounds like someone trying to justify something to themselves.
Otherwise it's a very long-winded way of saying, use the x265 defaults, which doesn't really need said at all.
Presumably it's a leftover from the x264 community where blur (a valid compression technique that can be overused) was demonized probably beyond what was supported by evidence so now they need to rein that bias in when blur is actually the best option available.
I don't mind blur in anime at all but it's very noticable (without using pause or magnifying glass) in live action footages, even when using low CRF. I have to turn off `SAO` and `strong-intra-smoothing` when using x265 for this reason.
> Bonus conversation: Due to x264 being much less complex, presets can pretty much maintain only a slight loss in quality as you encode faster. This gives an illusion that slower presets are slower due to spending more time compressing. The more correct way to see this is that slower presets are slower due to doing more motion calculations and finding the best scheme that best describes the frame, which in x264 just so happens benefits compression too. However, x265 is waaay more complex with motion algorithms, meaning that accurately describing motion actually increases bitrate.
This is totally confusing me. If the scheme that best describes the frame actually uses more data, and the user is targeting a lower bitrate/quality, shouldn't the encoder realize that the less accurate description is a more efficient choice, and select that?
The author is pretty confused. x264’s slower presets enable searching more points in ME and a more accurate RDO (among other things), just like x265’s. But in all cases for any modern encoder, all mode decisions are done baed on the estimated bitcost relative to distortion, so the slower presets are just evaluating more options. (kinda)
That said, it is the case across all codecs that more accurate motion vector fields (e.g. using optical flow) are not generally optimal. A simple way of reasoning about that is that one block reusing the same motion vector as a neighboring block is much cheaper than encoding a marginal difference, which often more expensive than the bits saved in the residual.
Anyway, CRF was never designed to map to the same quality across different encoder settings, which having actually read the article, is what I think they’re actually complaining about.
So what is designed for that? Generally as an encoder you want to target a certain (perceptual) quality level and when testing different settings you're aiming to hold that constant and see whether you achieve a lower bitrate. There should be a way to do that.
And the issue with that is that they're using PSNR and SSIM as quality metrics, which are useful for images, but haven't been designed for video, so they cannot take into account temporal factors and how motion masks distortion, etc.
But even after choosing a method, I want to be able to set target quality in a meaningful way. Having a parameter that I can set to specify quality on a method-agnostic scale seems like a basic usability feature.
It only targets a bitrate if you tell it to. Constant quality/crf mode doesn't care about bitrate, it just applies a quality level to whatever makes it to intra prediction (which is the stage after inter prediction, where all the motion compensation is).
> It was a re-encode at crf=22.5, with zero encoder tuning and preset at fast. Needless to say, it was blocking and artifacts galore. The icing on the cake? The audio was FLAC “to preserve audio quality”… I’m pretty sure there are more people in this world with 1080p screens than high-end headphones lol.
Not to say the author is wrong, as their example is very drastic, but I don’t think this matter is so black-and-white.
1) Audio quality matters more than the quality of the video in terms of viewing satisfaction (this one is my subjective opinion).
2) The better your audio system, the more important the source material quality.
3) Lossy compression is a one-way road. Did the distributor recode from lossless, or applied their own lossy encoding on top of an already lossy stream because they couldn’t be bothered to reconfigure their tools? You don’t even have the data to tell how messed up your sound is. Meanwhile, with lossless there are straightforward ways to tell whether it’s “true lossless” (i.e., wasn’t up-coded from a lossy format).
4) Distributing truly lossless video is currently just plain impossible (from obtaining the requisite master files to dealing with such volumes of data), so we are choosing between “different degrees of evil” out of necessity. Distributing lossless audio ultimately is feasible, at the expense of a couple hundred megabytes.
In the end, if we’re speaking about a work of value (e.g., a good classic film), I wouldn’t mind spending extra megabytes on preserving the original audio.
As the article points out, if the source is already lossy (which is (almost) all broadcast, DVD, and web sources) then you should just copy that bitstream into the resulting video (transcoding lossy to FLAC gives you the same quality at higher bitrate and is the worst of both worlds). The only common source you would get true lossless audio from is BluRay discs and even in that case the recommended audio encoding settings the author gives should be well into transparency for all but the most golden-eared listeners, and they would likely only be able to detect the difference in an AB comparison with the original source.
Agreed that up-coding is bad. Still think preserving lossless, if it is available in the first place, is worth it, and a couple hundred megabytes probably don’t matter to someone who chose a quality download rather than streaming.
The point that “the entirety of one’s experience is fully available to one’s conscious mind, so if one can’t tell the difference then it doesn’t matter” can be a philosophical question on which we’ll disagree.
DVDs can use PCM, however every commercial release I've seen uses Dolby Digital/AC3/A52 or DTS. (Apparently PAL DVDs may also use MP2 but as an American I've never personally encountered that on a commercially-released disc.)
My bad, shouldn't generalize. It probably depends on region and genre then. I collect Japanese music DVDs (music videos, lives), 99% of them use stereo PCM.
Yeah, I mostly collect(ed) films and TV. I can imagine that the people authoring music DVDs would decide the bitrate tradeoff would be worth it to get the better quality when the focus of the disc is on the sound.
Well compressed audio is indistinguishable from uncompressed stream to 99% of the population using 99% of the listening devices (even though those keep improving on average).
Nobody is arguing 96kbit mono should be enough for everyone obviously.
The point is that once you receive lossy audio, you can’t even verify how well it was compressed. Can you trust that it was encoded from a lossless source and not re-encoded a hundred times? Do you consider “it’s all good, unless you actually notice bad sound quality with naked ear in the middle of the film” is a sound principle from viewing experience standpoint?
(By the way, your numbers look like you pulled them out of nowhere, but if we are doing this then let me tell you: 99% of population will just stream. This discussion is already moot to everyone but the most picky viewers.)
>The more threads you add into a pool, the more encode overhead you will experience, since every row of encode requires the upper right CTU block to complete before it can proceed.
Nowadays, we don't have to worry about that, since we can use scene-based encoding instead. [1][2] In addition to allowing for full use of all cores, without any loss of encoding or processing efficiency, different encoding settings may be chosen for each scene, increasing potential efficiency.
I’ve wondered about this a lot. Anime (and many hand drawn animations in general) is typical fairly predictable. Objects, expressions, and even motions are reused frequently.
Shapes are often basic enough that a “vector” graphic could adequately represent their structure (in far fewer bytes).
If there were a way to structure anime into their core components, could we compress the video more like LZMA?
The JPEG XL bitstream format contains a "spline" primitive. Unfortunately, these don't get encoded particularly efficiently, and there aren't any encoders that can take advantage of it yet. JPEG XL can do animations, but it's not supposed to replace video codecs. It would be interesting to see similar features put into an actual video codec.
I think this would be like trying to turn a png back into an svg. The pixels on your screen might be exactly the same, but once it's rasterized, you can't really go backwards.
My experience is that VP9 is often a better choice than HEVC. The article mostly compared x265 to x264 because vp9 was not endorsed by Apple. That is no more the case, and I've found VP9 anime videos to have better quality at similar bitrate.
Recent VPx encoders are also better at multi-threading. For instance with ffmpeg: `-c:v vp9 -row-mt 1 -crf 26`
Looking at a number of device manuals (my car, my Garmin watch, ...) many of them claim to only support AAC files "encoded by the Apple encoder".
I think that's because AAC supports a wide range of coding tools on paper and off-brand encoders might use them, but everybody supports the ones the Apple encoder uses.
So basically, how can we use free and open technology to make the smallest file size while trying to retain as much quality as possible?
This also doesnt touch on what the anime is being sourced from. Furthermore, it only touches on encoder usage but not having to process the source such as using avisynth in some cases.
Yeah from the my experience the "hard" part is always about finding the right (albeit subjective) filtering in AVS/VapourSynth depending on the source.
Most of major release and fansub teams don't really vary their encoding parameters much.
The author is choosing not to name names, i.e. they aren't pointing a finger at the specific person/site they're criticizing.
"cough" or mumble is a slightly humorous shorthand/convention for this. It suggests an audio transcript mistake, that conveniently doesn't reveal the guilty party.
This is an interesting insight to piracy culture - I didn't know they had some kind of unofficial understanding between them on tech standards. Kind of funny!
(I think the biggest hurdle for HEVC is that it isn't available on a lot of devices. Sure, if you want to watch a video on a computer, HEVC will work. But if you want to watch the same video on your TV, chances are that it won't work. H.264 / AVC however is now near universal on most LED TVs sold).
Well, it's 2021 now, would be interesting to see a guide for AV1 encoding in the same vein. As I understand it, both of the leading AV1 encoders are still undergoing frequent changes and improvements, but it still would be nice to have a guide to help one to get a handle on things.