I once was responsible for migrating a legacy business app to Azure, and the app had a local MSSQL server co-running with the app (the same pattern that Litestream is using).
As have been mentioned below, the app had been developed assuming the local access (and thus <1ms latency), so it had a ton of N+1 everywhere.
This made it almost impossible to migrate/transition to another configuration.
So, if this style of app hosting doesn't take off and you're at all worried about this being a dead end storage once you reach a certain scale, I'd recommend not doing this, otherwise your options will be very limited.
Then again - I bet you could get very very far on a single box, so maybe it'd be a non factor! :)
Single instance is underappreciated in general. There's a used server reseller near me, and sometimes I check their online catalogue out of curiosity. For only $1000ish I could have some few generations old box with dual socket 32-core chips and 1TB of RAM. I don't have any purpose for which I'd need that, but it's surprisingly cheap if I did. And things can scale up from there. AWS will charge you the same per month that it costs to get one of your own forever - not counting electricity or hard drives.
I run my entire business on a single OVH box that costs roughly $45/month. It has plenty of headroom for growth. The hardest part is getting comfortable with k8s (still worth it for a single node!) but I’ve never had more uptime and resiliency than I do now. I was spending upwards of $800/mo on AWS a few years ago with way less stability and speed. I could set up two nodes for availability, but it wouldn’t really gain me much. Downtime in my industry is expected, and my downtime is rarely related to my web services (externalities). In a worst case scenario, I could have the whole platform back up in under 6 hours on a new box. Maybe even faster.
* Very flexible, but rigid deployments (can build anywhere, deploy from anywhere, and roll out deployments safely with zero downtime)
* Images don't randomly disappear (ran into this all the time with dokku and caprover)
* If something goes wrong, it heals itself as best it can
* Structured observability (i.e. logs, metrics, etc. are easy to capture, unify, and ship to places)
* Very easy to setup replicas to reduce load on services or have safe failovers
* Custom resource usage (I can give some pods use more/less CPU/memory limits depending on scale and priority)
* Easy to self-host FOSS services (queues, dbs, observability, apps, etc.)
* Total flexibility when customizing ingress/routing. I can keep private services private and only expose public services
* Certbot can issue ssl certs instantly (always ran into issues with other self-hosting platforms)
* Tailscale Operator makes accessing services a breeze (can opt-in services one by one)
* Everything is yaml, so easy to manipulate
* Adding new services is a cake-walk - as easy as creating a new yaml file, building an image and pushing it. I'm no longer disincentivized to spin up a new codebase for something small but worthwhile, because it's easy to ship it.
All-in-all I spent many years trying "lightweight" deployment solutions (dokku, elastic beanstalk, caprover, coolify, etc.) that all came with the promise of "simple" but ended up being infinitely more of a headache to manage when things went wrong. Even something like heroku falls short because it's harder to just spin up "anything" like a stateful service or random FOSS application. Dokku was probably the best, but it always felt somewhat brittle. Caprover was okay. And coolify never got off the ground for me. Don't even get me started on elastic beanstalk.
I would say the biggest downside is that managing databases is less rigid than using something like RDS, but the flip side is that my DB is far more performant and far cheaper (I own the CPU cycles! no noisy neighbors.), and I still run daily backups to external object storage.
Once you get k8s running, it kind of just works. And when I want to do something funky or experimental (like splitting AI bots to separate pods), I can go ahead and do that with ease.
I run two separate k8s "clusters" (both single node) and I kind of love it. k9s (obs. tool) is amazing. I built my own logging platform because I hated all the other ones, might release that into its own product one day (email in my profile if you're interested).
any notes or pointers on how to get comfortable with k8? For a simple nodejs app I was looking down the pm2 route but I wonder of learning k8 is just more future proof.
Use K3s in cluster mode, start doing. Cluster mode uses etcd instead of kine, kine is not good.
Configure the init flags to disable all controllers and other doodads, deploy them yourself with Helm. Helm sucks to work with but someone has already gone through the pain for you.
AI is GREAT at K8s since K8s has GREAT docs which has been trained on.
A good mental model is good: It's an API with a bunch of control loops
Definitely a big barrier to entry, my way was watching a friend spin up a cluster from scratch using yaml files and then copying his work. Nowadays you have claude next to you to guide you along, and you can even manage the entire cluster via claude code (risky, but not _that_ risky if you're careful). Get a VPS or dedicated box and spin up microk8s and give it a whirl! The effort you put in will pay off in the long run, in my humble opinion.
Use k9s (not a misspelling) and headlamp to observe your cluster if you need a gui.
I guess you got cheap power. Me too, but not 24/7 and not a whole lot (solar). So old enterprise hardware is a no-go for me. I do like ECC, but DDR5 is a step in the right direction.
I used to work on a product where the app server and database were in the same rack - so similar low latency. But the product was successful, so our N+1 would generate thousands of queries and 1ms would become >500ms or more easily. Every other month we would look at New Relic and find some slow spot.
It was a Rails app, therefore easy to get into the N+1 but also somewhat easy to fix.
For our rails app we actually added tests asserting no N+1s in our controller tests. Think a test setup with 1 post vs 10 posts (via factorybot) and you could do an assertion that the DB query count was not different between the two. A useful technique for any Railsheads reading this!
Way back in the prehistoric era of Rails I just wrote a like 5 line monkey punch to ActiveRecord that would kill mongrel if queries per request went above a limit.
Probably some of the most valuable code I've ever written on a per LOC basis lol.
But anyhow, merging that into a new project was always a fun day. But on the other side of the cleanup the app stops falling down due to memory leaks.
There's a common access pattern with object-relational mapping frameworks where an initial query will be used to get a list of ids, then an individual queries are emitted for each item to get the details of the items. For example, if you have a database table full of stories, and you want to see only the stories written by a certain author, it is common for a framework to have a function like
stories = get_stories(query)
which results in a SQL query like
SELECT id FROM stories WHERE author = ?
with the '?' being bound to some concrete value like "Jim".
Then, the framework will be used to do something like this
for id in stories {
story = get_story_by_id(id)
// do something with story
}
which results in N SQL queries with
SELECT title, author, date, content FROM stories WHERE id = ?
Oh yeah, the ORM thing (common side-effect with DB query abstractions) - I must not have been fully awake. Cheers and thank you for humoring me, @cbm-vic-20!
The thing where your app displays 20 stories in the homepage, but for each story it runs an extra query to fetch the author, and another to fetch the tags.
It's usually a big problem for database performance because each query carries additional overhead for the network round trip to the database server.
SQLite queries are effectively a C function call accessing data on local disk so this is much less of an issue - there's an article about that in the SQLite docs here: https://www.sqlite.org/np1queryprob.html
The N+1 problem basically means instead of making one efficient query, you end up making N separate queries inside a loop. For example, fetching a list of tables, then for each table fetching its columns individually — that’s N+1 queries. It works, but it’s slow.
We ran into this while building, funnily enough, a database management app called DB Pro (https://dbpro.app) At first we were doing exactly that: query for all schemas, then for each schema query its tables, and then for each table query its columns. On a database with hundreds of tables it took ~3.8s.
We fixed it by flipping the approach: query all the schemas, then all the tables, then all the columns in one go, and join them in memory. That dropped the load time to ~180ms.
N+1 is one of those things you only really “get” when you hit it in practice.
Object Relational Mapping (ORM) tools, which focus on mapping between code based objects and SQL tables, often suffer from what is called the N+1 problem.
A naive ORM setup will often end up doing a 1 query to get a list of object it needs, and then perform N queries, one per object, usually fetching each object individually by ID or key.
So for example, if you wanted to see “all TVs by Samsung” on a consumer site, it would do 1 query to figure out the set of items that match, and then if say 200 items matched, it would do 200 queries to get those individual items.
ORMs are better at avoiding it these days, depending on the ORM or language, but it still can happen.
I dislike ORMs as much as the next ORM disliker, but people who are more comfortable in whatever the GP programming language is than SQL will write N+1 queries with or without an ORM.
Very true. But ORMs did make it particularly easy to trigger N+1 selects.
It used to be a very common pitfall - and often not at all obvious. You’d grab a collection of objects from the ORM, process them in a loop, and everything looked fine because the objects were already rehydrated in memory.
Then later, someone would access a property on a child object inside that loop. What looked like a simple property access would silently trigger a database query. The kicker was that this could be far removed from any obvious database access, so the person causing the issue often had no idea they were generating dozens (or hundreds) of extra queries.
This problem is associated with ORMs but the moment there's a get_user(id) function which does a select and you need to display a list of users someone will run it in a loop to generate the list and it will look like it's working until the user list gets long.
I really wish there was a way to compose SQL so you can actually write the dumb/obvious thing and it will run a single query. I talked with a dev once who seemed to have the beginnings of a system that could do this. It leveraged async and put composable queryish objects into a queue and kept track of what what callers needed what results, merged and executed the single query, and then returned the results. Obviously far from generalizable for arbitrary queries but it did seem
to work.
I think many ORMs can solve (some of) this these days.
e.g. for ActiveRecord there's ar_lazy_preloader[0] or goldiloader[1] which fix many N+1s by keeping track of a context: you load a set of User in one go, and when you do user.posts it will do a single query for all, and when you then access post.likes it will load all likes for those and so on. Or, if you get the records some other way, you add them to a shared context and then it works.
I defense of the application developer, it is very difficult to adopt set theory thinking which helps with SQL when you've never had any real education in this area, and it's tough to switch between it and the loop-oriented processing you're likely using in your application code for almost everyone. ORMs bridge this divide which is why they fall in the trap consistently. Often it's an acceptable trade-off for the value you get from the abstraction, but then you pay the price when you need to address the leak!
I mean, that's not much of a trade off given that it seems that what you're saying is that using such a service might just show you how shit your code actually is.
I don't understand all the pessimism and incredulity about the valuation. This is an acquisition to take on and disrupt Apple.
Ives + Altman is perceived as a viable successor to the Ives + Jobs partnership that made Apple successful.
Apple is weak and doesn't seem capable of innovating anymore, nor do they seem to understand how to build AI into products.
There's an opportunity to build an Apple-sized hardware wearables company with AI at its core, just as Altman built ChatGPT and disrupted the Google-sized search.
How exactly does OpenAI go about disrupting Apple? Are they going to build an entire OS, line of hardware products, and create a massive developer ecosystem to for apps to be available?
I just don't exactly see how that is done by hiring a bunch of designers to a company whose current offering is a chatbot & API interface.
I don't think ChatGPT really disrupted Google search? It definitely forced Google to release Gemini + related products though. Google still has millions of users and they now have AI integrated with search. The latest Gemini models are also as capable if not more than some of OAI's models.
I don't see how Altman is going to disrupt Apple with just Ive and a company no one's heard of before.
What? Note for any juniors reading this: DO NOT TRY THIS AT HOME.
Does the author enjoy writing code primarily because they enjoy typing?
Are they not able to have the mental discipline to think and problem solve whilst using the living heck out of an AI auto complete?
What's the fun in manually typing out code that has literally been written, copied, copied again, and then re-copied so many times before that the LLM can predict it?
Isn't it more dangerous to not learn the new patterns / habits / research loops / safety checks that come with AI coding? Surely their future career success will depend on it, unless they are currently working in a very, very niche area that is guaranteed to last the rest of their career.
I'm sorry, this is a truly unnatural and absurd reaction to a very natural feeling of being out of our comfort zone because technology has advanced, which we are currently all feeling.
Sorry, I probably phrased that poorly - when you’re coding with AI, you should get in the habit of spending more time checking for security mistakes. Not sanitizing, not scoping properly. Same mistakes a junior or mid would make, but unlike them, the AI will not doubt itself and highlight particular code it wrote asking “is this right?”. So you need to develop the habit of being careful.
The fundamental problem here is lack of context - a human at your company reading that text would immediately know that Gorilla was not an insider term, and it’d stick out like a sore thumb.
But imagine a new employee eager to please - you could easily imagine them OK’ing the document and making the same assumption the LLM did - “why would you randomly throw in that word if it wasn’t relevant”. Maybe they would ask about it though…
Google search has the same problem as LLMs - some meanings of a search text cannot be de-ambiguified with just the context in the search itself, but the algo has to best-guess anyway.
The cheaper input context for LLMs get, and the larger the context window, the more context you can throw in the prompt, and the more often these ambiguities can be resolved.
Imagine in your gorilla in the step example, if the LLM was given the steps, but you also included the full text of slack/notion and confluence as a reference in the prompt. It might succeed. I do think this is a weak point in LLMs though - they seem to really, really not like correcting you unless you display a high degree of skepticism, and then they go to the opposite end of the extreme and they will make up problems just to please you. I’m not sure how the labs are planning to solve this…
I think the assumption is that this is going into a somewhat modern hashing algorithm like argon, bcrypt (created 1999 - that's a quarter-century ago), or scrypt with salt. With those assumptions, the calculations aren't reusable, and definitely not 1B passwords / second.
If that's not true and the password is being stored using MD5 (something that's been NIST-banned at this point for over a decade), then honestly all bets are off, and even 128 bits of entropy might not be enough.
Rails went through a down period 2014-2020 due to several reasons:
1. React burst on the scene in 2014
2. the hyperscale FANG companies were dominating the architecture meta with microservices, tooling etc, which worked for them at 500+ engineers, but made no sense for smaller companies.
3. there was a growing perception that "Rails doesn't scale" as selection bias kicked in - companies that successfully used rails to grow their companies, then were big enough to justify migrating off to microservices, or whatever.
4. Basecamp got caught up in the DEI battles and got a ton of bad press at the height of it.
5. Ruby was legitimately seen as slow.
The big companies that stuck with Rails (GH, Shopify, Gitlab, etc, etc) did a ton of work to fix Ruby perf, and it shows. Shopify in particular deserves an enormous amount of credit for keeping Ruby and Rails going. Their continued existence proves that Rails does, in fact, scale.
Also the meta - tech-architecture and otherwise - seems to be turning back to DHH's favor, make of that what you will.
The RoR hype started to wane long before React. You're really missing a huge part of our industry:
- While most 2nd or 3rd tier tech companies don't need Google scale infrastructure, SOA in Java/C# and then Go is incredibly prevalent. Many teams never had a reason to even look at RoR and its considerably worse language and runtime.
- Good ideas from RoR were copied by pretty much every ecosystem; again, most people never wanted Ruby in the first place.
you might mean shopify, not spotify. I think spotify is python/go, whereas shopify was started by a rails core contributor and probably has the biggest rails deployment in the world
>which worked for them at 500+ engineers, but made no sense for smaller companies
The number of hi-tech companies that grew from 37signals size to Uber size have also increased due to various reasons: SaaS becoming more and more accepted, WallStreet loves SaaS, and in general just more investment money in the market.
I had a bad experience with Action Cable + Redis (extremely high memory overhead, tons of dead redis connections), so it's a bit "fool me once" with regard to action cables.
The main argument for caching in the DB (the slight increase in latency going from in-memory->DB-in-memroy is more than countered by the DB's cheapness of cache space allowing you to have tons more cache) is one of those brilliant ideas that I would like to try at some point.
Solid job - i just am 100% happy with Sidekiq at this point, I don't understand why I'd switch and introduce potential instability/issues.
This was interesting, especially with the DCF example at the end - it’s pertinent to business sell decisions (assuming your ownership structure allows you to make a decision) should I sell at an 8x multiple of revenue, or hold at an X% growth rate and Y% cash flow? What’s my net after 10 years?
The point of Jensen’s inequality if I understand correctly is that you’d underestimate the value of holding using a basic estimate approach, because you’ll underestimate the compounding cash flow from growth?
It depends on whether future returns increase. One tends to draw the optimistic version of Jensen’s inequality (and in general, of convex curves), but it also applies to decreasing functions.
The raw data is available for download and you can compare not getting into any accidents to their number of accidents per however many hundreds of thousands of miles.
There isn't much to the data available for download, but it looks like 0.00001207261588 accidents per mile, or ~1.2 accidents per 100,000 miles (268/22199000). Figuring your father drives 15k miles per year, times 30 years and rounding up to 500k miles, Waymo has a recorded 6 accidents to your father's 0.
Not sure why that's an interesting comparison, however.
Assuming your dad is good at not driving when he shouldn't (tired/drunk/angry), he's not on the road when it's worrisome. I don't worry about getting into accidents with drivers who aren't on the road, I worry about the tired/drunk/angry drivers I do have to share the road with. Waymo at 2:15am after the bars let out is much less worrisome than any other car at that time, because I have no idea who's in that other car. Your father could be the safest driver ever, but I have no idea if it's him in the other car, or if that driver is totally blacked out and shouldn't be driving.
Thanks for doing the math and making this concrete!
I think it’s interesting because:
1) it gives Waymo a higher target to shoot for - it hasn’t “solved” self-driving because its safer than the average driver. I am so impressed by Waymo, but I feel like some of this article smacked of premature “mission accomplished” vibes. The fact that it just accepted the comparison to average without caveat is an example of that.
2) As a matter of policy, everyone can agree that a Waymo ride home for the tipsy is good, but where policy will have issues is convincing good drivers such as my dad to take Waymos everywhere. Not to mention most drivers irrationally think they’re way better than average - that will affect policy in a real way.
I wouldn’t be surprised if a top x% of drivers would outperform Waymo at this current juncture. (Especially perhaps in things like heavy rain).
However, I’ll use myself as an example. I’ve driven over 1.2 million miles over the last 17 years. (It may be closer to 1.5 million but I’m only counting miles on vehicles I’ve owned as that’s easier for me to calculate quickly ).
Without an accident.
I know that I get tired and my driving skills drop[1]. I have to sneeze or cough and they drop. I’m on a long continuously straight road, stressed about xyz and they drop.
Self driving cars won’t necessarily have the same short comings.
Even the self driving features that my car has (very limited, 2023 Infiniti Q50 Red Sport 400) are better at times then I am. (Though it often will wait far too long to brake in my opinion when using the auto cruise even at maximum distance).
However, I do think humans have advantages too in some situations. (If you can on average track what’s around you and try to think as close as possible to 12 seconds ahead…. If you haven’t try it sometime on the highway. )
[1] perhaps luckily, I have a tendency to drive slower when fatigued. However at times I’ve pushed it, realized I’m going 15 mph below the speed limit, and realized I needed to not be driving. I also know that when not fatigued my lane assist will usually trigger 0 times on a normal 20-30 mile drive. When fatigued, it may begin to trigger once every 10 minutes. To clarify, I’m not out of my lane, but I’m moving out of center.
Why? We have a major problem right now; cars are deathtraps and roads are murder weapons. We haven't been able to do anything about that historically without taking unacceptable economic damage, but we're right on the cusp of massive improvements to the situation.
How the top 10% of drivers are going in amongst all that isn't really a factor as far as I can see. They'll probably end up banned from taking the wheel at some point for consistencies sake but they are ultimately not really a factor. Besides, automated cars will overtake (hehe) their skills at some point whether we track it specifically or not; in the long term humans can't compete against an engineered process.
I once was responsible for migrating a legacy business app to Azure, and the app had a local MSSQL server co-running with the app (the same pattern that Litestream is using).
As have been mentioned below, the app had been developed assuming the local access (and thus <1ms latency), so it had a ton of N+1 everywhere.
This made it almost impossible to migrate/transition to another configuration.
So, if this style of app hosting doesn't take off and you're at all worried about this being a dead end storage once you reach a certain scale, I'd recommend not doing this, otherwise your options will be very limited.
Then again - I bet you could get very very far on a single box, so maybe it'd be a non factor! :)