JJ's conflict handling seems nice. Can't say that it's a big enough problem for ...

martinvonz · on Oct 19, 2024

> JJ's conflict handling seems nice. Can't say that it's a big enough problem for me to qualify as a "major feature", but a seemingly nice improvement.

The advantages people first think of when they hear about jj's conflict handling are usually that you can collaborate on conflicts and that you can leave conflicts for later. What's less obvious [1] is that being able to store conflicts in commits means that we can always rebase descendants, so there are states like what Mercurial (and Sapling, I think) call "obsolete" and "orphan". There is also no "interrupted rebase" state when you're resolving conflicts.

These things simplify for the user. They also simplify a lot for developers. An example is how I spent about 2 weeks on a `hg amend --into` command for amending the changes in the working copy into an ancestor. I then implemented that in under an hour in jj. Much of the complexity in hg stemmed from dealing with the interrupted states while dealing with conflicts. (Other complexity in hg that jj doesn't have is dealing with a dirty working copy, dealing with concurrent operations, and simply complexity in the APIs for creating new commits in memory.)

[1] IIRC, it took me about a year after I added support for "first-class conflicts" until I figured out that it meant that we should simply always rebase descendants. Jujutsu had a orphans and a `jj evolve` command before then.

aseipp · on Oct 19, 2024

Conflict handling is much more useful whenever you actually have conflicts arising regularly. :) For working on JJ itself, this is super useful because it's still actively having tons of code written by 2-3 people, continuously. In other words, the velocity for newly written code is still very high, internal APIs change breaking PRs, etc. I think how often you handle conflicts in a project has a big number of factors. But JJ really does make conflict handling not 1 but like 3-5 times easier, IMO. So you only really need to do it once to be wow-ed, in my opinion. (One of my previous coworkers said something like, and I quote, "This is fucking awesome.")

JJ has an abstract notion of a working copy and even an abstract notion of a backend that stores commits. It is a set of Rust crates, and these are interfaces, so you can just implement the interfaces with whatever backend you want. For example, the working copy can be described in terms of the filesystem (POSIX copy/rename/move files around) or in terms of a virtual filesystem (RPC to a server that tells you to make path x/y/z look like it did at version 123.)

You can absolutely have "cloud commits" like Mononoke/Sapling does, and that is a central feature desired at Google too, where Martin works. Skip to the end of this talk from Git Merge 2024 a few weeks ago, and you can see Martin talk about their version of `jj` used at Google, that interacts with Piper/CitC: https://www.youtube.com/watch?v=LV0JzI8IcCY

My understanding is that the design works something like this (no promises but we may approximate something like this in JJ itself): A central server (Piper) stores all the data. A local pair of daemons runs on your machine: a proxy daemon that talks with the server on your behalf (is used to reduce RPC latency for e.g. commits and writes, and does asynchronous upload) and a virtual filesystem daemon (CitC). The virtual filesystem daemon manages your working copy. It "mounts" the repository at version 123 (the "baseline version") onto a filesystem directory, by talking to the server. Then it tracks changes to the directory as you update files; sort of like OverlayFS does for Docker.

When you make a commit, tell the VFS layer to snapshot the changes between your working copy and the baseline, uploading them to the server. Then tell the backend that you've created version 14 out of those files/blobs. The new version 124 is now your new baseline. Rinse and repeat to create version 125, 126, etc...

The TL;DR on that is the VFS layer manages the working copy. The server/proxy manage interaction with a backend.

OK, monorepo export of subprojects. For the purposes of exporting monorepo subfolders, I'm actively exploring and thinking about "filtering" tools that can be applied to a commit graph to produce a new commit graph. Mononoke at Meta can do something like this. Check out this tool Josh that does it for native Git repositories: https://josh-project.github.io/josh/reference/filters.html

The idea is let's say you have commits A -> B -> C and you want to export a repository that only contains a project you want to export, located under paths path/to/oss/thing. You basically sweep the commit graph and only retain commits that touch path/to/oss/thing. If a commit does not touch this path, remove it from the graph and "stitch" the graph back together. If a commit does touch this path, then keep it.

If a commit touches that path and other paths that don't match, then create a new commit that only touches files under that path. In other words, you might derive a new commit graph A' -> C' where B was thrown away because it touched private code, and A' and C' are "filtered" versions of the originals that retain a subset of their changes. The secret is that this algorithm can be fully deterministic, assuming the input commit graph is "immutable" and cannot be changed.

This idea actually is kind of like applying a MATERIALIZED VIEW to a database. In fact it could actually be a kind of continuous materialized view where a new derived graph is computed incrementally and automatically. But how do you express the filters? The underlying filter is really a way of expressing a functional map between graphs. So should filters only be bijective? What if they're injective? Etc. I haven't fully figured this out.

So anyway, I've never really worked in gamedev or huge monorepo orgs before, but these are all problems I've acutely felt, and Git never really solved. So, I can say that yes, we are at least thinking about these things and more.

forrestthewoods · on Oct 19, 2024

> whenever you actually have conflicts arising regularly.

I mean I work at Meta. We have lots of people working on stuff at the same time. :) AraxisMerge auto-merges quite nicely. Text conflicts have just never been a meaningful issue for me in my 15+ year career.

Now binary conflicts are a source of pain! Unfortunately require Perforce style file-locking once you hit a team size.

> I've never really worked in gamedev or huge monorepo orgs before, but these are all problems I've acutely felt, and Git never really solved. So, I can say that yes, we are at least thinking about these things and more.

Good to hear! Yeah the workflow for subproject syncing it's super obvious. It definitely can be done but will require a lot of thinking and a few iterations.