HW failures are rare. At least hardware failures that matter. Disks in a RAID set dying or redundant power supply failures are not critical, and even those are more rare than you would generally expect them to be. With a bit of standardization it's incredibly cheap to keep a pool of spares handy and RMA the failed components at a leisurely pace.
Plus, you're still engineering your applications to be just as fault-tolerant as if they were running in cloud, right? The only difference is you are not paying the virtualization overhead tax. A single server dying should leave you in a no less redundant state than a single VM dying. They should also be nearly as easily deployable.
This is based off my personal experience in datacenters with 5,000-10,000 installed servers. Anything other than a PSU or HDD failure is exceedingly rare.
Plus, you're still engineering your applications to be just as fault-tolerant as if they were running in cloud, right? The only difference is you are not paying the virtualization overhead tax. A single server dying should leave you in a no less redundant state than a single VM dying. They should also be nearly as easily deployable.
This is based off my personal experience in datacenters with 5,000-10,000 installed servers. Anything other than a PSU or HDD failure is exceedingly rare.