The premise of virtual tables (Postgres FDW) is to not store the data but instea...

davesque · on Sept 30, 2022

Awesome, thanks so much for that! I guess then it sounds like, even if bandwidth were not an issue for a datasource, there would still be the physical limitations of the machine on which steampipe would do any later processing (such as joins between datasources). In other words, there's no sense in which steampipe distributes this work across multiple processes or machine. Is that correct?

Sorry if a dumb question. I could be thinking entirely in terms of the wrong paradigm here since my work in this space was primarily concerned with distributed computing and big data.

nathanwallace · on Sept 30, 2022

Steampipe computes the SQL part in a single Postgres instance (not distributed) - we've not found this to be a significant limit (so far). A difference in this case is that the query is really combining and organizing results that may been processed at the source. In 2022 most services are API first and offer great searching and filtering, so we can use that to do much of the query processing in many cases.

davesque · on Sept 30, 2022

Great, thanks again! Yeah, a lot of the things steampipe is doing are right up my alley. Fun to learn about! Watching your talk now and enjoying it.