An aside, which doesn't necessarily affect your reasoning about Books...
You didn't mention that GOOG-411 (I still have the t-shirt and other schwag) also had rampant abuse, had seen legitimate traffic shift to smartphones and, last but not least, had awful audio quality, so it wasn't just cut off for no good reason. The data set is not as valuable as might appear at first. The Google Cloud Speech API documentation recommends 16KHz 16-bit samples, not the 8KHz 8-bit PCM (at best) you get from DS0/POTS.
The speech corpus being collected was not a secret at all, either.
You didn't mention that GOOG-411 (I still have the t-shirt and other schwag) also had rampant abuse, had seen legitimate traffic shift to smartphones and, last but not least, had awful audio quality, so it wasn't just cut off for no good reason. The data set is not as valuable as might appear at first. The Google Cloud Speech API documentation recommends 16KHz 16-bit samples, not the 8KHz 8-bit PCM (at best) you get from DS0/POTS.
The speech corpus being collected was not a secret at all, either.