One big issue with gathering this kind of data over the phone is the frequency cutoff on voice-only lines—above a certain frequency (I want to say 4kHz? maybe I'm misremembering), the information is lost. It's basically as if you took a Fourier transform, zeroed out everything above the threshold, and then transformed back.[0] For humans (and even computers) trying to interpret the sound as language, that's not a huge problem, although you might lose some of the higher formants. But for an acoustic analysis that's trying to do voiceprinting—in this case to detect Parkinson's—this could be a big problem.
(I'm also irritated by the glib "99 percent success rate" but I just ranted about that on a different HN post so I won't go into it here.)
[0] Why do this? So the phone company can compress and send a lot more data over the same amount of internal bandwidth. Come to think of it, it's kind of related to how wavelet-based compression works.
The sampling frequency doesn't directly effect the audio frequencies it can encode. Telephones do PCM encoding (meaning it has data representing the graph of the sound wave) at 8 kh/z. Following the nyquist sampling theorem (cut your rate by 2), this can allow frequencies up to 4 kh/z (as you said). It's not a hard cutoff though, you can still get most of the sounds above that pitch, they'll just sound pretty weird (as if you were talking on the telephone!)
> So the phone company can compress and send a lot more data
Actually, the limits date back to analog phone lines with circuit switching of a century ago. Back then there was a wire going through switches from one phone to another.
The quality requirements were that those lines had to pass 300 Hz to 3.4 kHz or so. That often required "pupinization" - adding inductive coils to tune the line's frequency response.
If you look at a spectrogram of your voice, there's very little power above 1.5 or 2 kHz. However, it seems the high frequency part is important to understandability, including perception of emotional overtones.
(Just the other day, playing with modems, we found a weird case - voice being pumped through before the call was considered completed - which I suspect is the persistence in digital protocols of the analog behavior of a century ago.)
(I'm also irritated by the glib "99 percent success rate" but I just ranted about that on a different HN post so I won't go into it here.)
[0] Why do this? So the phone company can compress and send a lot more data over the same amount of internal bandwidth. Come to think of it, it's kind of related to how wavelet-based compression works.