Basically the client connects to SIP server using ssl connection authenticated on both sides.
When placing calls the clients A and B are negotiating SRTP session key using DH key exchange. It is done
over SIP (and not over RTP channel as in ZRTP).
Each client upon registration generates public/private key pair and submits a CSR to the registration service which signs it and stores the public key (which is later used to authenticate the above mentionned ssl connections) in the SIP server's DB...
The server has no access to the client's private key nor to the SRTP session key
Yes, with the cooperation from CA the MITM is still possible. We however will provide server code to especially paranoid clients so they can build and run the software on their own machines... This way they can have garanties against certificate tampering.
And we're working on an alternative solution when even cooperating CA will not allow MITM...
Well, this tech is derived from the project which was designed to meet specs of one of our clients.
We did propose ZRTP during design phase, to the client but they security analysts decided against it. They affirm that given the state the current state of art in speech recognition and synthesis ZRTP can be vulnerable on impersonation during short code validation phase for the attacker with sufficient resources.
I'm personally doubtful, but one thing i'm sure about, is that this client security experts have access to info and resources which are not available to me.