You can also take the same route as sel4. They have a formal, machine readable specification of what their C code is supposed to do, and then verify that the output of the C compiler matches that specification. It's a nice 'have your cake and eat it too' situation where you can have formally verified, high assurance binaries that were compiled with gcc -O2, but with gcc removed from the trusted computing base.
sel4 is also the work of a very small handful of people, with a lot of the verification code hand built. A huge chunk of the work needn't be redone, just like you wouldn't need to rewrite a c compiler for every c project.
So yes, they kept the verified code small, so that they could focus on all of the tooling. I'm unconvinced that formal methods intrinsically can't scale.
Unfortunately I'm also not sure if formal methods can scale. In large projects the area is almost completely untouched. From what I can tell those who are using it in large projects aren't talking. (safety critical and military applications both tend to be places where you don't talk about what you have done). I'm looking into proposing to management that it is a solution to our quality problems, but I don't want to look like a head in the clouds guy with no business sense. (We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about)
I'd say that the generic tooling isn't quite there yet for most business needs. The sel4 guys have a lot of inrastructure that'd need to be decoupled a little from their use case to reuse.
Also, I'd add that when you say
> We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about
in a lot of ways formal verification only helps you with situations you know about, specifically problem classes that you know about. For instance sel4 was susceptible to Meltdown.
interesting and useful point of view. One of our most common problems right now is importing bad data. I'm hoping a formal method could "somehow" force us to find all the places where we act on data and ensure we validated it first. This is a known class of problem, but it is easy to overlook that some file might be mostly valid but have one field that is incorrect.