Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The reason the nearest neighbour interpolation can sound better is that the aliasing fills the higher frequencies of the audio with a mirror image of the lower frequencies. While humans are less sensitive to higher frequencies, you still expect them to be there, so some people prefer the "fake" detail from aliasing to them just been outright missing in a more accurate sample interpolation.

It's basically doing an accidental and low-quality form of spectral band replication: https://en.wikipedia.org/wiki/Spectral_band_replication which is used in modern codecs.





It's actually the other way round: Aliasing fills the lower frequencies with a mirror image of the higher frequencies. So where do the higher frequencies come from? From the upsampling that happens before the aliasing. _That_ makes the higher frequencies contain (non-mirrored!) copies of the lower frequencies. :-)

Just so that my wrongness isn't there for posterity: This is wrong for a real-valued signal (which is what we're discussing here). I had forgotten about the negative frequencies. So there _is_ a mirror coming from the upsampling. Sorry. :-)

Oh yes you're correct, imaging would be the correct term for what's happening I think (aliasing is high -> low and imaging is low -> high)?

I think I've heard the word “images” being used for these copies, yes.

Interpolation is a bit of a confusing topic, because the most efficient implementation is not the one that lends itself the easiest to frequency analysis. But pretty much any rate change (be it up or down) using interpolation can be expressed equivalently using the following set of operations and appropriately chosen M and N:

  1. Increase the rate by inserting M zeros between each sample. The has the effect of creating the “images” as discussed.
  2. Apply a filter to the resulting signal. For instance, for nearest neighbor this is [1 1 1 … 0 0 0 0 0 …], with (M+1) ones and then just zeroes; effectively, every output sample is the sum of the previous M+1 input samples. This removes some of the original signal and then much more of the images.
  3. Decrease the rate by taking every Nth sample and discarding the rest. This creates aliasing (higher frequencies wrap down to lower, possibly multiple times) as discussed.
The big difference between interpolation methods is the filter in #2. E.g., linear interpolation is effectively the same as a triangular filter, and will filter somewhat more of the images but also more of the original signal (IIRC). More fancy interpolation methods have more complicated shapes (windowed sinc, etc.).

This also shows why it's useful to have some headroom in your signal to begin with, e.g. a CD-quality signal could represent up to 22.05 kHz but only has (by spec) actual signal up to 20 kHz, so that it's easier to design a filter that keeps the signal but removes the images.


And also, to add to the actual GBA discussion: If you think the resulting sound is too muffled, as many here do, you can simply substitute a filter with a higher cutoff (or less steep slope). E.g., you could use a fixed 12 kHz lowpass filter (or something like cutoff=min(rate/2, 12000)), instead of always setting the cutoff exactly at the estimated input sample rate. (In a practical implementation, the coefficients would still depend on the input rate.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: