There is an interactive playground there as well: https://qrlew.github.io/dp
You can change things on the left and then rewritten query shows on the right.
It does; I got this running easily because fortunately I have current postgres and psycopg2 installed so no tangential problems to interfere with the main event.
So clearly the example shows how I can issue a SELECT statement against the raw data. Is there a concise statement about what the DP'd statement offers me? This isn't access control or raw data obfuscation in the traditional sense. What is the consumable use case here as opposed to, say, the statistically private use?
Imagine you want to open an access to your DB to someone - Alice - you do not particularly trust. And you want to make sure Alice cannot learn anything about an individual in the database.
Maybe you will filter the queries that you assume safe, let's say aggregation queries. But to prevent GROUP BYs with singleton groups, you will enforce some more constraints. Let's says GROUPs should have a minimal size of 10 elements. But then Alice could run a first aggregation with a GROUP of 100 and the same with a GROUP of 100 minus Bob, by using both responses Alice can know things about Bob.
Then you will design more involved rules and grow a complex set of rules and eventually add human supervision and your system will not be scalable.
With DP, you add some noise to a query result so that you have a guarantee that nothing substantial can be learned about Bob.
Imagine you want to open your DB to the outside world in a scalable way, you can design a service that:
- receives SQL queries
- use Qrlew to rewrite them
- run the rewritten query on any DB and send back the result as a response
- you would also accumulate privacy loss in a privacy accountant so that you make sure the amount of private information leaked by the repeated disclosure of DP query results stays bounded.
That's what Sarus does as a company. Qrlew is the DP-SQL core we use.