For various industry-specific or specialized task models (e.g. recognizing dangerous events in self-driving car scenario) having appropriate data is often the big secret sauce, however, for the specific case of LLMs there are reasonable sets of sufficiently large data available to the public, and even the specific RLHF adaptations aren't a limiting secret sauce because there are techniques to extract them from the available commercial models.