I have worked with all of these models. I have figured out that monorepo mainly ...

zarzavat · on Aug 20, 2021

That's the whole point of a monorepo is it not? You get rid of all the ceremony about APIs and just say "Whomever modifies an API is responsible for updating all the API consumers as well as the API itself" (and the reverse too).

This means that developers are free to write code that is correct now rather than be limited by technical decisions made in the past, that may no longer be valid.

It also means that when you do update an API you are confronted with the technical reality of how the API consumers actually work. This may cause you to reevaluate and improve your mental model of the API.

If your API is siloed away in its own repo then you can easily get in a position of taking it in a direction that is not actually in line with what API consumers actually need.

rahkiin · on Aug 20, 2021

Whenever you do code updates of separate services, you still need to update all running instances at the same time. Being able to change the code in 1 commit does not necessarily mean the services also update at the same time, which will cause api errors.

lhorie · on Aug 20, 2021

It depends on what kind of APIs we're talking about. There's APIs in the sense of protobuf microservices, which are deployed individually and talk to live systems, but there's also APIs in the sense of libraries.

In my company the former is handled by never deprecating any field (and then clients just get to deal with picking and choosing the ones they want). If a major schema change is required, spin up an entirely new service and tell people to migrate over and eventually turn off the old one when nobody's using it anymore.

Library changes can be done atomically: change the API, change the call sites, if tests pass you're done. One may opt to use the same strategy as w/ microservices here too.

Regardless, the dynamics for interacting w/ microservice API changes doesn't change based on whether you're on a monorepo or not. But a monorepo can help in the sense that some aspects of a service are version-controllable (namely the schema definition files) and it's in the clients' best interest to have those always up-to-date rather than having to remember to run some out-of-band syncing command all the time.

jayd16 · on Aug 20, 2021

If you wanted to you could spin up new clusters of every changed service and direct traffic such that every version of a service is a separate cluster. Then slowly redirect external traffic to new services.

Every internal service would always hit the exact version it was compiled with and you only need to worry about external api compatibility at that point.

Most use cases you can just get some scheduled downtown, though.

bluGill · on Aug 20, 2021

Common ownership of code doesn't work though. People becomes experts in things they work on a lot, and will refactor better for it because they have long term plans they are working towards. Someone outside making updates will make a mess of the long term plans not understanding where it needs to go.

JamesSwift · on Aug 20, 2021

Maybe I missed some implied context, but I would still assume a CODEOWNERs model for the monorepo, wherein _someone_ is the expert/owner of any given folder of code and is brought into conversations when other are touching that code.

lhorie · on Aug 20, 2021

Yep. Bazel has a visibility feature that indicates what projects are allowed to consume a given thing. This is used by the owner of the library to indicate to what extent they are willing to support the library. Some libraries are only meant to be used by the immediate team, some are designed to be useful to other teams.

In the same vein, at my company we have a mechanism to specify package ownership, and the owners may opt to make themselves mandatory code reviewers for any incoming change.

IMHO, this is nicer than multi-repo because you get a lot more visibility into who's actually using what and you can enforce some level of accountability, which means you don't get into awkward situations where A made a breaking change, B uses A but never upgraded and now C is trying to deal with a newly discovered vulnerability affecting A and B.

JamesSwift · on Aug 20, 2021

Right, what you outline here is exactly what I have in mind when I think of monorepo. I actually always assume the CODEOWNER is a mandatory code reviewer in this model. The import rules are less a guarantee since they rely so much on tooling support.

lhorie · on Aug 20, 2021

The way we do it is a bit more complex to account for meatspace:

A project may have ownership data, and if it does it defaults to making the owner an optional reviewer. But if the project doesn't have ownership data, ownership bubbles up the folder tree to a folder that does have such information.

We organize projects such that increasing folder depth also increases ownership specificity (e.g. at the project level, the folder structure implies a specific team has ownership, one level up is their business vertical, one level up is the cost center category and so on, all the way up to root, which represents "everyone"). With this scheme, we can reassign ownership in situations like when the sole owner of a project leaves the company, or if teams get restructured. And by not making review mandatory, we are able to unblock cases like landing a high priority security patch while the reviewer of one of the affected packages is on vacation.

bluGill · on Aug 20, 2021

There can be that (should be), but the difference is the code owner isn't making the change, instead the person changing the API is making it.

zarzavat · on Aug 20, 2021

If there's a dispute then a monorepo makes things easier because all interested parties have a complete global view of the entire change.

In a multi-repo setup, one change becomes many commits and everybody sees only their piece of the jigsaw puzzle.

It does depend on everybody being adults and behaving sensibly; unfortunately there's no technology that can solve that problem.

cat199 · on Aug 20, 2021

software repositories don't solve process problems in meatspace.

bluGill · on Aug 20, 2021

That is the point. People who claim any repository organization scheme is better are wrong because they can't touch on the real problems.

jayd16 · on Aug 20, 2021

The failings of scaling source control should not be what enforces your dependency graph. Some other tool that actually understands the semantics of your code can do that. Source control should just support any workflow you throw at it.

lmilcin · on Aug 20, 2021

And this is right. In ideal world the choice of source control would not influence how you design your application.

Unfortunately, the world is not ideal. Some problems that are easily detected when you have multiple applications that have their own teams, apis, release schedules, and repositories stop being naturally easily detectable when every developer has access to every repository and can introduce changes in sync on both client as well as service side.

One company I worked for used monorepo. Developers would shortcut development process and would omit some practices by modifying both client and the server at the same time, in sync, not caring for backwards compatibility.

Then there was a huge outage because the two changes would require both client and the service to be deployed at the same time. But you know, in a distributed system no two things happen at exactly the same time, so there was a short moment where the services were mismatched and some broken data was saved. And a day later that broken data completely destroyed an important production batch with large loss for the company.

And while it is definitely fun, convenient and efficient to be able to do just that, it requires also a little bit of care so that the whole system does not deteriorate over time.

jayd16 · on Aug 20, 2021

How does forcing two commits to two repos help this example in any way? You still need the testing to catch backwards compatibility and that can be done in a monorepo. It can arguably be done better because you have a more direct link to the downstream code.

bluGill · on Aug 20, 2021

Every choice is a compromise with pros and cons. You make a choice and then figure out how to mitigate the cons of your choice.