I have figured out that monorepo mainly covers for faults in maintaining separation between components (like ensuring only single direction of dependencies and no cycles, ensuring backwards compatibility, self-service, et.c) and to some extend to cover for out of control microservice craze.
Kinda like giving unrestricted access to your PROD databases does help improve efficiency, but at the cost of additional risks and deterioration of separation between applications, lack of APIs for users to self service themselves, etc.
Especially with microservices, these tend to fare poorly if you don't solve static costs of maintaining a service. So rather than invest in proper tooling let's just plop it into single repository. It will cover some problems a little bit.
Now, I don't want to say monorepo is bad by itself. The problem is when it stunts people's ability to maintain proper APIs and development process.
That's the whole point of a monorepo is it not? You get rid of all the ceremony about APIs and just say "Whomever modifies an API is responsible for updating all the API consumers as well as the API itself" (and the reverse too).
This means that developers are free to write code that is correct now rather than be limited by technical decisions made in the past, that may no longer be valid.
It also means that when you do update an API you are confronted with the technical reality of how the API consumers actually work. This may cause you to reevaluate and improve your mental model of the API.
If your API is siloed away in its own repo then you can easily get in a position of taking it in a direction that is not actually in line with what API consumers actually need.
Whenever you do code updates of separate services, you still need to update all running instances at the same time. Being able to change the code in 1 commit does not necessarily mean the services also update at the same time, which will cause api errors.
It depends on what kind of APIs we're talking about. There's APIs in the sense of protobuf microservices, which are deployed individually and talk to live systems, but there's also APIs in the sense of libraries.
In my company the former is handled by never deprecating any field (and then clients just get to deal with picking and choosing the ones they want). If a major schema change is required, spin up an entirely new service and tell people to migrate over and eventually turn off the old one when nobody's using it anymore.
Library changes can be done atomically: change the API, change the call sites, if tests pass you're done. One may opt to use the same strategy as w/ microservices here too.
Regardless, the dynamics for interacting w/ microservice API changes doesn't change based on whether you're on a monorepo or not. But a monorepo can help in the sense that some aspects of a service are version-controllable (namely the schema definition files) and it's in the clients' best interest to have those always up-to-date rather than having to remember to run some out-of-band syncing command all the time.
If you wanted to you could spin up new clusters of every changed service and direct traffic such that every version of a service is a separate cluster. Then slowly redirect external traffic to new services.
Every internal service would always hit the exact version it was compiled with and you only need to worry about external api compatibility at that point.
Most use cases you can just get some scheduled downtown, though.
Common ownership of code doesn't work though. People becomes experts in things they work on a lot, and will refactor better for it because they have long term plans they are working towards. Someone outside making updates will make a mess of the long term plans not understanding where it needs to go.
Maybe I missed some implied context, but I would still assume a CODEOWNERs model for the monorepo, wherein _someone_ is the expert/owner of any given folder of code and is brought into conversations when other are touching that code.
Yep. Bazel has a visibility feature that indicates what projects are allowed to consume a given thing. This is used by the owner of the library to indicate to what extent they are willing to support the library. Some libraries are only meant to be used by the immediate team, some are designed to be useful to other teams.
In the same vein, at my company we have a mechanism to specify package ownership, and the owners may opt to make themselves mandatory code reviewers for any incoming change.
IMHO, this is nicer than multi-repo because you get a lot more visibility into who's actually using what and you can enforce some level of accountability, which means you don't get into awkward situations where A made a breaking change, B uses A but never upgraded and now C is trying to deal with a newly discovered vulnerability affecting A and B.
Right, what you outline here is exactly what I have in mind when I think of monorepo. I actually always assume the CODEOWNER is a mandatory code reviewer in this model. The import rules are less a guarantee since they rely so much on tooling support.
The way we do it is a bit more complex to account for meatspace:
A project may have ownership data, and if it does it defaults to making the owner an optional reviewer. But if the project doesn't have ownership data, ownership bubbles up the folder tree to a folder that does have such information.
We organize projects such that increasing folder depth also increases ownership specificity (e.g. at the project level, the folder structure implies a specific team has ownership, one level up is their business vertical, one level up is the cost center category and so on, all the way up to root, which represents "everyone"). With this scheme, we can reassign ownership in situations like when the sole owner of a project leaves the company, or if teams get restructured. And by not making review mandatory, we are able to unblock cases like landing a high priority security patch while the reviewer of one of the affected packages is on vacation.
The failings of scaling source control should not be what enforces your dependency graph. Some other tool that actually understands the semantics of your code can do that. Source control should just support any workflow you throw at it.
And this is right. In ideal world the choice of source control would not influence how you design your application.
Unfortunately, the world is not ideal. Some problems that are easily detected when you have multiple applications that have their own teams, apis, release schedules, and repositories stop being naturally easily detectable when every developer has access to every repository and can introduce changes in sync on both client as well as service side.
One company I worked for used monorepo. Developers would shortcut development process and would omit some practices by modifying both client and the server at the same time, in sync, not caring for backwards compatibility.
Then there was a huge outage because the two changes would require both client and the service to be deployed at the same time. But you know, in a distributed system no two things happen at exactly the same time, so there was a short moment where the services were mismatched and some broken data was saved. And a day later that broken data completely destroyed an important production batch with large loss for the company.
And while it is definitely fun, convenient and efficient to be able to do just that, it requires also a little bit of care so that the whole system does not deteriorate over time.
How does forcing two commits to two repos help this example in any way? You still need the testing to catch backwards compatibility and that can be done in a monorepo. It can arguably be done better because you have a more direct link to the downstream code.
I have figured out that monorepo mainly covers for faults in maintaining separation between components (like ensuring only single direction of dependencies and no cycles, ensuring backwards compatibility, self-service, et.c) and to some extend to cover for out of control microservice craze.
Kinda like giving unrestricted access to your PROD databases does help improve efficiency, but at the cost of additional risks and deterioration of separation between applications, lack of APIs for users to self service themselves, etc.
Especially with microservices, these tend to fare poorly if you don't solve static costs of maintaining a service. So rather than invest in proper tooling let's just plop it into single repository. It will cover some problems a little bit.
Now, I don't want to say monorepo is bad by itself. The problem is when it stunts people's ability to maintain proper APIs and development process.