Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Octo – Generate a serverless API from an SQL query (octoproject.github.io)
241 points by khalidlafi on Sept 11, 2020 | hide | past | favorite | 58 comments


I've got a similar project that reads your db schema and generates a Go REST API and a TypeScript/React web interface. (The code-generation is language agnostic so at some point I'd like to add at least a Java REST API as well.) It supports PostgreSQL, MySQL, and SQLite.

Unlike PostgREST/Hasura and some other dynamic tools you can "eject" at this point if you'd like and continue on development without the generator in a language you already know. But I'm working on exposing Lua-based hooks you could carry across whatever backend language you choose to generate and avoid the need to eject.

It has builtin support for paginated bulk GET requests with filtering, sorting, limiting. Built-in support for bcrypt-password authentication and optional SQL filters specified in configuration for authorization of particular endpoints based on session and request metadata.

Still very much a work in progress but the goal is to push the envelope on application boilerplate.

Screenshots are of the example/notes project in the repo.

https://www.dbcore.org/

https://github.com/eatonphil/dbcore


I feel like projects like this work for simple stuff but as soon as you need analytics/insights or actual business logic, you almost always need to just "roll your own" API. Am I wrong? Do other people feel this way? Can anybody think of a few projects they've worked on that would be too complex/a ton of work to make work with these kind of simple template generators?


Long-term maintainability is definitely my concern. I don't see projects like this so much as products themselves (maybe I'm myopic) but as core infrastructure. I don't want to ever write the boilerplate again, but I should be able to extend it maintainably over time (hence ejecting or Lua hooks).

My goal in building this is to allow myself to more rapidly prototype real, complex applications. It's not there yet but I've got such an application in mind, building toward support for it as I'm developing this.


It seems like when working with generators, the trick is to have the right boundaries between generated code, points where you can extend the generated code, and the API through which you use the generated code. If successful, you should never feel the need to hand edit the generated code itself, and you shouldn't need to worry too much about re-running the generator breaking things or stomping on your code.


>> you shouldn't need to worry too much about re-running the generator breaking things or stomping on your code.

Can't agree more! There are code parsers which give out a DOM and then code can be manipulated. But is a bit of work. We are trying this concept in our framework.


I think on average you're right. As the project grows, these generators lose some of their initial value and speed.

If I were to use something like this, it'd be for rapid development initially for prototyping purposes. Then I'd transition to something more bespoke as needed.

If you know all your requirements up front, then starting bespoke from the beginning may be the better route.


With postgrest you can add stored procedures as rest rpc calls, and you can always roll a microservice for more advanced stuff. In practice these sorts of auto-api tools make a good starting point as long as they support your authentication and authorization needs.


What I like about postgREST is you’re leveraging a mature system for authorization (postgresql roles) rather than rolling your own outside the database and introducing a ton of code and vulnerabilities.

Not every application needs roles though, but it definitely feels like the saner approach if you do.

Also a nice side effect, depending on the team, is a significant performance boost as your forced to go deep in database land instead of leaky ORM abstractions.


Postgraphile allows you to write plugins to arbitrarily extend (GraphQL) schema and wrap resolvers, which is useful for things like interfacing with external APIs etc. It works just fine for me at my current project.


The sweet spot for things like this is for them to generate the sort of code you'd want anyway, in a way which allows you to selectively replace bits where you need additional business logic or UI customisation.

Rails' generators are a pretty good starting point here, if a little bit more verbose than I'd like. They're great for getting the boilerplatey bits off the ground, and focusing on the bits that are unique to what you're doing.


Used PostgRest once. We had a project with datasets for sale where all the data was stored in a database and didn't need any updates or transactions. Postgrest was easy to create a self-contained API.

https://github.com/PostgREST/postgrest


DreamFactory is basically a paid service for this sort of thing. They support something like 20 types of databases (among many other data sources). They have a lot of features that make the exposed api be good enough long-term. https://www.dreamfactory.com


Ya, I have very few tables in my db that can be updated directly without needing to pass the data through some business logic, trigger notifications, etc


I’m using Prisma right now and they allow you to manually expose your fields, and they allow you to “resolve” any field as you see fit. If you want to run the original function, you can provide “originalResolve” and call it later. I think Prisma has a great (albeit in progress) way of doing what you’re saying.

It also integrates with graphql-codegen so you can generate code for apollo/others


This is not dis-similar to what Strapi.io does, although I don't think they realize that's a big selling point from their marketing materials.

With strapi you configure your DB and get code generated in JS that supports a standard CRUD REST API. If you want to add business logic, you can override any particular endpoint you want. Their docs even come with the default implementation for easy copy/paste.

I would love to see research in this space continue, I think it's the future of bringing non-technical people into the product development process (if you can understand building a workflow with Excel/Google Sheets/Airtable, you can understand building an API). I'm excited to check out your project.


My thoughts exactly as I’ve been exploring strapi for a recent project.

The UI allows for build out the schema, and has all the CRUD interfaces pre-built and the CRUD api endpoints.

But then the code is still all there and you can add additional controllers actions, model hooks and services.

I haven’t run into any hard limitations yet, been very impressed.


In case anyone else is wondering what's doing the templating in this project, it's Scriban.

https://github.com/lunet-io/scriban


A port of Ruby's liquid templates to .NET, yep.


The idea is interesting. But it looks like you end up with a yaml file that enumerates each of your tables/endpoints and the queries that back them. So are we exchanging the “complexities” of code, where we have control and testing, for the “lack of complexity” of yaml that becomes unwieldy and untestable in the name of “simplicity?”


Don’t forget that at some point, you’ll want to generate the yaml from code, because otherwise it becomes impossible to maintain. And quickly you’ll find yourself back at square 1. :)


One of the things that's not obvious to me about things like this (and other similar tools) is where/how scopes/limitations/permissions are handled. I assume they either are or can be, I just never see it spelled out clearly. What am I missing?


I can't speak for this project specifically but for some context, Postgraphile's way of solving this, as it only supports Postgres, is to use Postgres's Row Level Security feature, whereby you enforce scopes and permissions at the data layer, as well as using table grants to roles specified in JWTs and such. (https://www.graphile.org/postgraphile/postgresql-schema-desi...)

This project doesn't seem to have any inbuilt AuthZ functionality, so unless your database has that built in like Postgres, or you need something that's not possible in-database, I guess you just... can't.


I recently started a side project on top of Postgraphile and I had to end up scrapping it in favor of something I was more familiar with.

The biggest problem is that unless the main language you’re familiar with is PL/pgSQL, you’ll eventually run into the roadblock that is having everything reside in the DB.

In my case, I simply could not figure out how to use the Users table without having the password returned in all queries. I could turn off the automatically generated queries that Postgraphile made, but at that point why bother with it?

Other problems are that there isn’t a mature and well established method of maintaining development, staging, and production database schemas on projects with more than one developer.

Graphile makes https://github.com/graphile/migrate, however, it seems incredibly brittle and is built on a workflow that will absolutely break production if it is slightly deviated from.

I really liked using it! I just hope some of these developer ergonomics issues could be better handled.


> In my case, I simply could not figure out how to use the Users table without having the password returned in all queries

The recommended way of doing this is to store anything you don't want public in a separate table with a one-to-one relationship, and then controlling access to that table through computed columns and such (https://www.graphile.org/postgraphile/postgresql-schema-desi...)

What I can't recommend enough to people starting with Postgraphile is to check out graphile-starter, and especially it's first migration: https://github.com/graphile/starter/blob/main/%40app/db/migr...

That really helped me to understand better how to structure my schema, as separation of concerns within a model by using different tables is not something that is necessary when doing a regular application-in-front-of-datastore type app.

> Other problems are that there isn’t a mature and well established method of maintaining development, staging, and production database schemas on projects with more than one developer.

Yes, I don't have to grapple with this issue so I don't really have much to say about this. Graphile-migrate works great for me, but I can see how it could be an issue for larger projects, although I would think that database migrations are simply tricky on their own, regardless of Postgraphile.


You’re completely right about the starter and I feel if I had discovered it first instead of trying to roll my own migration system with Knex and stuff, I would have stuck with Postgraphile.

But instead of rebasing my project with the starter I decided to go back to what I was comfortable with and do a Rails gql api.

I’ll definitely be trying it out in the future though! I felt so incredibly productive with rolling stuff out before auth became a concern.


Yes it's been incredibly quick for me to iterate and build my current project. You are still right about migrations though, I'm not sure how that would work in a bigger team with feature PRs etc.


Also, you’d want to put the passwords/secrets table in a separate schema from the one you’re exposing postgraphile to


I tied this "no backend" approach but ended up writing all of my business logic in the DB schema instead of in the app, which seemed to defeat the purpose.


Well if your application has business logic at all you're going to have to write business logic at some point, and I personally like the idea of keeping as much of it co-located with the data - it makes your schema the SSOT for your data structure and it's constraints and transformations.

The complication is as always external services, like sending email, validating IAP receipts, things like that. For that I liked to take some of the principles from the 3factor app devised by the Hasura team, and make them asynchronous, and performed by worker processes (in my case, with graphile-worker).


Agreed, I really liked having everything so close and tightly coupled with the DB. The integrations and business logic were the part I had to build traditionally and at that point it wasn't much more to query the DB from the backend rather than trying to learn how to script inside Postgres.


I tackled this by using SQL as the filter language where the generated code will fill in some context-specific variables like current session user id or current request object id.

This is a little limited in this current form and I'm working to expand SQL filters to match up to HTTP codes so you can say this request needs to have a session otherwise it is a 401, then it also needs to match another filter otherwise it is a 403. But this other endpoint is ok to show without a session if the object being requested is marked public in the db.

There's a lot to think through especially when extending these filters to bulk methods.

https://github.com/eatonphil/dbcore/blob/master/examples/not...


Perhaps I'm old, but who needs an API for an SQL query? I'm not sure I understand the use case, or the advantage of something like this over a regular API call to a backend which would also allow you to do e.g. authentication. Enlighten me?


One of the best use cases for this is say you have a backend/internal system and you want other things to start interacting with it. Instead of having to write the api to interface with it, you can just use something like this and with little effort you have an api and can talk with the database.


I think the point he was making is: why the API if you just want to talk to the db? You can connect to a SQL db over the network and protect the data with views and stored procedures.


Because you can't (easily) connect to an SQL DB from any arbitrary client, like a web browser or mobile app.


That's actually exactly what we're trying to build at Splitgraph [0]. :) We're building a "data delivery network" (DDN), which is like a CDN, except instead of forwarding HTTP requests to upstream web servers, we forward SQL queries to upstream databases.

The premise of the idea is that we can cut out the middle-man for a lot of data distribution use cases. We give you a way to deliver your data in native SQL, using the Postgres wire protocol. We've decoupled authentication from the database, so we can do it in a gateway / LB layer using PgBouncer + Lua/Python scripting. Any SQL client can connect to the public Splitgraph endpoint (as far as a client is concerned, Splitgraph is just a really big Postgres database). You can write queries referencing and joining across any of the 40k datasets on the platform.

In fact, just this week we've been working on v0.0.0 of our web client. This lets you do things like share and embed SQL queries on Splitgraph, e.g. [1] (this query actually joins across two live data portals at data.cityofchicago.org and data.cambridgema.gov).

There's also an example here of using an Observable notebook with the Splitgraph REST API [2]. It also works with the Splitgraph DDN configured as a Postgres database, but that's only supported in private notebooks for now (since normally it's a bad idea to expose your DB to the public!)

In general, we like the idea of adding more logic to the database. Tools like OP's are useful in this regard. In fact, at Splitgraph we use Postgraphile internally (along with graphql-codegen for autogenerated types) and we have nothing but good things to say about it.

[0] https://www.splitgraph.com/

[1] https://www.splitgraph.com/workspace/ddn?layout=hsplit&query...

[2] https://observablehq.com/@mbostock/hello-splitgraph


As part of our product (https://seekwell.io/), we let people access SQL results with an API key and unique endpoints per query. There's also an option to add parameters.

The main use case is giving a data scientist or another application access to the results of a few arbitrary queries without giving them full access to the database. So it's a bit like giving them access to a SQL view, but without them needing to set up a driver, etc. to connect.


In my case, I have a simple obvious use case.

I work for a large corporation. They want to implement the Bezos mandate [1]. No direct database access between teams, API abstraction for everything.

OK, now let's think in onion layers (or hexagonal/clean architecture if you like). Think of layers of services with different purposes - data services (containing no application/business logic), application services (for business logic, orchestration, process), and UI/UX services (to power differing end user experiences).

Data services don't have to do much - be the data/repository layer, expose productive interfaces for CRUD. Need to read across data sources? Think of federated data services that can combine data on the fly, perhaps like GraphQL.

These kinds of tools are perfect for the first layer of services that abstract the database world from the application world. Just simple services, even ones that effectively let you mimic what SQL queries can do (filter, sort, page, etc.). Individual record oriented interfaces, and bulk oriented interfaces. The query side of CQRS (command query separation).

Many will say, "I don't need all this complexity and layers" - and sure, for smaller or simpler applications, probably not!

But, if you have to operate on any kind of larger scale, with multiple data sources, systems, etc., you end up needing the layers. And these types of tools automate some of the lower layers.

Perhaps when we talk about this, we shouldn't be focusing on "oh it's too complicated", and instead building frameworks or reference architectures that automate away the complexity - so it looks easy again, but now it is more flexible, perhaps easier to scale.

I believe that we are on the cusp still, of an almost fully defined, service based architecture (microservices and server less were just one part of the continuing development of that story). Federation is another part of that story oft ignored. Thinking of the onion as service layers is another part. Erasing the network boundary as a concern through much higher speed internetworking is another part.

Eventually we may come to see, that it is all a big "system", some parts just aren't connected to each other directly.

Sorry, got a bit rant-y at the end there :) Just passionate about sharing this world view with others - as I continue to see this architecture developing!

[EDIT] I wanted to add, it's not just that the use case for this is in a data service layer for automation - from a logical perspective I mean. In big companies such as the one I work for, we never get the resources we need, ironically. We are overwhelmed with demands, and must operate under the Bezos mandate rules. Tools such as Octo are not a panacea, but, they are a good compromise if you have to move fast, they are time-and-cost-savers. And they can get you surprisingly far.

[1] https://www.calnewport.com/blog/2018/09/18/the-human-api-man...


Looks great!

If you like this, check out OctoSQL[0]... Also in Go... Though OctoSQL lets you query multiple databases / files / event streams like kafka using SQL from your command line, not as a server, so a fairly different use case, but you should check it out nevertheless!

The naming clash is funny.

[0]: https://github.com/cube2222/octosql


I realy like your tool. In fact I am slowly integrating it into a solution which will expose a REST API and workspaces identified by a UUID. In our organisation it is so common to receive an Excel or csv which you have to join with the database. Octosql is great for that.

I am wondering what future role badger will play in the future? It would also make a great additional KV backend btw.


That's really great to hear!

We're considering moving to a more in-memory model, as we're not sure if the badger storage idea was a good one and worth it.

TBH we're still not quite sure in what direction we'll be continuing. Though we're surely gonna be developing it further.

But currently we're considering a rewrite with multiple assumptions changed (column oriented).


Looks vaguely similar to http://postgrest.org/


And Hasura, Postgraphile et al too. These, as well as PostgREST also give you much more flexibility in the form of plugins in library mode and other such things - they also generate the actual queries for you, via introspection, as opposed to this which requires you to write the query yourself.

I think there's certainly space for this project, i.e. hand-written queries, on any database (Postg[REST|raphile] both only work with Postgres of course, not sure about Hasura). Not sure it will succeed without support for more forms of Serverless deployment, primarily Lambda.


Nice to see openfaas featured here and thanks for your PRs to Arkade. I do wonder what your strategy is on connection pooling and authentication?

Also not keen on the passwords being kept in a plaintext file - someone will check that into git. OpenFaaS has secret support which you can use Amal. So does Knative.


Reminds me of the venerable Datasette by Simon Willison: https://github.com/simonw/datasette


Interesting concept and quite liked the playful logo. Can we pass in env variables to db connection ?

We are in similar space, we take input params of db and generate CRUD apis with Auth+ACL and then APIs are packed into a single lambda function. There is support for serverless framework as well.

[1]: https://github.com/xgenecloud/xgenecloud


Yes, for example:-

? Enter the database password: ${DB_PASSWORD}


Interesting, I've built a similar project that generates GraphQL API based on your database schema - https://okdb.io


My main purpose of tools like these has always been prototypes, or hobby one-off type stuff. For SPAs, or a sketch with a Jupyter notebook. They're great for this sort of thing because in my experience, this used to require building some sort of API just to get a simple json interface to the database. It was my understand that the purpose of these types of tools was mostly that.

Are folks using these kind of things for non-trivial production applications?


I fear that all of these "expose your DB as an API" tools like this, Postgraphile, Hasura, etc. are going to set up folks for a world of hurt down the road. Tightly coupling your end clients to your database schema can make it extremely difficult, if not impossible, to refactor your DB in you need to (which is highly likely).


I’m building a project using one of those tools. I imagine that difficulty refactoring your database is more a problem of bad schema design than the tool. If you normalize and abstract out the implementation details into Views, I can’t see how refactoring would be difficult. Haven’t built anything at scale with Postgraphile/Harusa, so just wondering if I’m missing anything here.


Views make it trivial to decouple what a query returns from the underlying schema


Looks like someone took this tweet literally https://twitter.com/davecheney/status/1296033304756404225


Logo is looking good , gj


Do u know any similar tool wwhich supports group by query ?


Very impressive, Great job.


Your timing is perfect.


Very interesting projects and can be scalable Keep up the good work




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: