Show HN: Qrlew, simple SQL to SQL-with-privacy written in Rust

buzzm · on March 27, 2024

Is there a "hello world" example available that is a highly simplified version of https://qrlew.readthedocs.io/en/latest/tutorials/getting_sta... ? Just 2 tables, 2 fields apiece, with the join bringing private data into the mix?

ngrislain · on March 27, 2024

I wrote a minimal demo here: https://github.com/Qrlew/docs/blob/main/tutorials/minimal.ip...

There is an interactive playground there as well: https://qrlew.github.io/dp You can change things on the left and then rewritten query shows on the right.

I hope it helps.

buzzm · on March 28, 2024

It does; I got this running easily because fortunately I have current postgres and psycopg2 installed so no tangential problems to interfere with the main event.

So clearly the example shows how I can issue a SELECT statement against the raw data. Is there a concise statement about what the DP'd statement offers me? This isn't access control or raw data obfuscation in the traditional sense. What is the consumable use case here as opposed to, say, the statistically private use?

ngrislain · on March 29, 2024

Imagine you want to open an access to your DB to someone - Alice - you do not particularly trust. And you want to make sure Alice cannot learn anything about an individual in the database.

Maybe you will filter the queries that you assume safe, let's say aggregation queries. But to prevent GROUP BYs with singleton groups, you will enforce some more constraints. Let's says GROUPs should have a minimal size of 10 elements. But then Alice could run a first aggregation with a GROUP of 100 and the same with a GROUP of 100 minus Bob, by using both responses Alice can know things about Bob. Then you will design more involved rules and grow a complex set of rules and eventually add human supervision and your system will not be scalable.

With DP, you add some noise to a query result so that you have a guarantee that nothing substantial can be learned about Bob.

Imagine you want to open your DB to the outside world in a scalable way, you can design a service that: - receives SQL queries - use Qrlew to rewrite them - run the rewritten query on any DB and send back the result as a response - you would also accumulate privacy loss in a privacy accountant so that you make sure the amount of private information leaked by the repeated disclosure of DP query results stays bounded.

That's what Sarus does as a company. Qrlew is the DP-SQL core we use.

maximeago · on March 27, 2024

Nice way to make differential privacy approachable to everyone with open source. Was it peer-reviewed?

ngrislain · on March 27, 2024

Yes Qrlew is based on a research paper (https://arxiv.org/pdf/2401.06273.pdf) presented at a AAAI 2024 workshop: https://ppai-workshop.github.io/

As you may know, Differential Privacy is hard to implement right. To foster trust, we relied on a two-pronged strategy:

- Open-source

- Peer reviewed methodology

Feel free to reach out to us if you need more details.