POSTS

Personal Git Monorepo

If you do a lot of open-source/personal software projects spanning multiple repos, you’ll find a fair amount of tooling drift across projects. Keeping up all this tooling was taking up a decent amount of my time, so I decided to look into creating a personal monorepo as a way to simplify tooling by sharing a single git repo.

While I could have built out additional tooling to treat multiple repositories as one, I had accumulated a sufficient amount of technical debt over the years that adding more software on top didn’t seem palatable.

Some ways this incidental complexity manifested:

  • multiple version control systems (I had migrated several repos to fossil, which I think is still great, but not meant for this sort of mess)
  • inconsistent branch naming (some still had master instead of main)
  • inconsistent development tool stacks (missing gems, inconsistent configurations, directory layout differences, multiple testing suites)
  • multiple languages (this has made code sharing harder and requires more mental context switching)
  • difficulty keeping projects in sync with third-party code
  • an inability to do codemod-style refactor changes over all projects

It’s unclear if this will be how I handle my collective source code long-term, but the exercise itself is already yielding fruit. It becomes more obvious how much incidental complexity arises when you’ve got 50+ repos, many of which are one-offs, or have been awaiting an open-sourcing but require too much one-off effort in bringing the quality up before being comfortable releasing the code.

I don’t plan on releasing the monorepo itself for a few reasons

  • Some personal projects that won’t be open-sourced for various reasons
  • Because I think it’s easier for folks to use and contribute to a smaller, focused git repo than pull down a few decades worth of unrelated version control history
  • It’s easier for me to figure out what that person is trying to accomplish in the GitHub PR process.

How I’m currently doing it

I’ve got a single git repository that has multiple remotes, one for each project.In order to keep the tooling simple to write, I keep to two git conventions:

  1. The top-level directory name matches the remote name
  2. All remotes pull the main branch (I’ve been renaming branches as I pull older repos in)

To glue everything together, I use a git script called subtree and wrote a few small scripts that handle the repository conventions - subtree-add, subtree-push, and subtree-pull. There’s more advanced things you can do with subtrees (Micheal Jones has done a good job documenting them), but so far my scripts have handled the common cases of dealing with remotes nicely, and I’ve also seen a few gitconfig aliases floating around the Internet. Depending on later tooling needs, I might eventually fork subtree and bake my conventions in, along with adding in a few other commands for removing history, etc. A quick read of the scripts shows that there’s no changes to the git plumbing - what these tools boil down to is a way to consistently operate on a git tree with directory prefixes. It wouldn’t take much effort to port this to a non-git VCS, but at this point I’m sticking with git largely out of inertia.

I’ve got a Rakefile at the base of my monorepo that automates the gruntwork of pushing/pulling remote changes out en masse, as well as a few CI-style tasks that download dependencies and runs tests.

Would you benefit from a monorepo?

I’m hesitant to suggest for or against - I think it’s a nuanced subject that’s heavily context-dependent. My needs (or Google/Facebook/other monorepo users) aren’t your needs, and vice versa. I would at the least ask the following questions before moving forward.

  • Do you have 30-50+ repos that you’ve touched in the past few years? Perhaps seen another way - have you tracked how much time you actually spend moving from project to project?
  • Are you familiar with your version control system’s internals? (ie not the xkcd comic level of familiarity). When you have to handle gnarly merges, move stuff around, or handle performance issues, you’re treading into “won’t find exact/any answer on StackOverflow” territory. Do you want to own that? Do you want to spend your time on that?
  • Are you willing to build and massage tooling that is specific to that monorepo and has zero benefit to anyone/anything. Tooling isn’t always some 5,000 line C++ monster - it’s the shell scripts that glue stuff together, the scraps of documentation that you’ve found useful, the 20-line Ruby program that handles syncing that one subrepo that “has some baggage.”
  • Are you sharing this with other people? How many people’s workflows are you supporting with this?

I’m not trying to scare you away from a monorepo, merely pointing out that the time and effort cost is non-zero, and if you only work on a few projects (which is fine, by the way) then it might be easier to have a handful of git repos and call it a day, too. You don’t have to do whatever the cool kids on Hacker News are doing today. If you’ve got to keep 20-30 slightly different build system configurations in your head, though, a monorepo (even as a temporary solution) could be useful.