This is part 2 of a two part blog post on our migration to the polyrepo. If you haven’t yet read part 1 and want some more context, you can find it here.
In part 1 of this blog post, we discussed how our original polyrepo led to complicated workflows which hindered engineering productivity, why we believed that a monorepo would mitigate these problems, and the potential downsides of switching to a monorepo.
In this second part of the blog post, we’ll review how we actually executed the migration to a monorepo without losing any commit history — namely, how we managed to consolidate all our code that was previously scattered across repos into one giant repo with a centralized commit history — and elaborate on the benefits of the migration.
A simplified example
After weighing the pros and cons of a polyrepo vs a monorepo, we decided to migrate to a monorepo. But how would we port all code into one central repo while keeping commit history? In this section, we’ll share the series of steps we followed to consolidate our code without losing its history.
Consolidating code while retaining commit history
While our original polyrepo contained many repos, for the purpose of this explanation we’ll assume that we originally had three distinct repos named
frontend. We want to consolidate these three repos into a monorepo, whose structure would look something like this.
- opaque - .gitignore - client/ - frontend/ - sql/ ...
In each repo we want to consolidate, we create a new branch,
monorepo, and move all of the repo’s files into a newly created directory. The new directory should have the same name as the intended sub-directory in the monorepo, i.e.,
As an example, on the
master branch of our
opaque-client repo, we initially have the following structure.
- opaque-client - .github/ - client-docs/ - demo/ - python-package/ - src/ - .clang-format - .flake8 - .gitignore - Dockerfile - README.md - pyproject.toml - requirements.txt
monorepo branch of our
opaque-client repo, we migrate to the following structure:
- opaque-client - client/ - .github/ - client-docs/ - demo/ - python-package/ - src/ - .clang-format - .flake8 - .gitignore - Dockerfile - README.md - pyproject.toml - requirements.txt
Note: By default,
mv src/path/* dst/path/ won’t move hidden files (i.e., files and directories beginning with
.). As a result, we had to make sure we additionally ran
mv src/path/.* dst/path/.
Next, we initialize the new monorepo
opaque on GitHub:
# Create a new directory named `opaque` from the command line mkdir opaque # Initialize the monorepo with Git cd opaque/ git init # Add the repo we just created on GitHub as the `origin` remote git remote add origin [email protected]:opaque-systems/opaque.git
Then, for each repo
repo we want to add to the monorepo, we
repoas a new remote in the monorepo repository
- Fetch the contents of the new remote
- Merge in the contents of the new remote. This step enables us to copy all of our Opaque Client’s code into the monorepo while retaining its commit history.
To continue our example from before, here’s what adding the
opaque-client repo to the monorepo looks like:
# Add our `opaque-client` repo as the `client` remote git remote add client [email protected]:opaque-systems/opaque-client.git # Fetch its contents git fetch client # Merge the `monorepo` branch of the `opaque-client` repo # into our new `opaque` repo. This will retain the `opaque-client` # repo's history, even in our new repo! # # It's crucial that we use the `monorepo` branch to end up with the # correct structure. # # Note that the last parameter in this command is of the form `<remote>/<branch>` git merge --allow-unrelated-histories client/monorepo
After doing the above, the
opaque directory has the following structure:
- opaque - client/ - .github/ - client-docs/ - demo/ - python-package/ - src/ - .clang-format - .flake8 - .gitignore - Dockerfile - README.md - pyproject.toml - requirements.txt - README.md
Finally, after following the same steps for the other repos, our new monorepo looks like the following:
- opaque - client/ - .github/ - client-docs/ - demo/ - python-package/ - src/ - .clang-format - .flake8 - .gitignore - Dockerfile - README.md - pyproject.toml - requirements.txt - frontend/ - assets/ - src - .eslintrc.js - .gitignore ... - sql/ - build/ - data/ - project/ - src/ - .gitignore ... - README.md
The last step is to merge all of the
.gitignore‘s into one. Simple enough!
And just like that, our monorepo contains all the code across our entire codebase and its commit history!
Benefits of the migration
The monorepo migration immediately streamlined our workflows and simplified our codebase. As mentioned in part 1 of the blog post, we initially faced a number of issues due to the polyrepo structure:
- A single change could require multiple, synced, PRs
- Our release process was cumbersome
- Our CI/CD pipelines required lots of duplicated code
- The process for building documentation was convoluted as a result of the documentation source being scattered across multiple repos.
See part 1 for more details on each of these problems. The monorepo mitigated all of these issues, as explained below:
- Since all code is stored in a single repo, only one PR is ever necessary: this drastically simplifies the process of submitting and testing any changes. In the following section, we give an explicit comparison between the old and new workflows to demonstrate how drastically the monorepo simplified things.
- Having one repo that contained all our code enabled us to drastically simplify our release process. Instead of having to create a release and a tag in 7 different repos, we can now create a release by interacting with only one!
- Having all components in the same repo allowed us to centralize all of our testing/deploying code, vastly simplifying maintenance and improving readability. For example, the code for running build tests across all of our services went from over 1,500 lines of code to less than 500.
- Our one monorepo contained the documentation source across our entire codebase, making the documentation build much simpler.
Let’s go through an explicit example of how this actually affected an engineer’s workflow.
Let’s say that one of our back-end engineers is adding an additional parameter to the Hash function in our cryptography library. To do this, they will need to modify four modules in our backend codebase
client: the Opaque client application that enables users to remotely interact with Opaque’s cloud services
tms: the Trust Management Service, an enclave-based cloud service that manages credentials and keys and facilitates multiparty collaboration
sql: Opaque SQL, our secure analytics engine running within and on top of enclaves
utils: A C++ library that contains useful helper functions for cryptography and remote attestation
Let’s compare what the engineer would have to do with our original polyrepo to what they would have to do with our new monorepo.
- Modify the
Hashfunction in the
utilscodebase and open a PR into the
- Modify any calls to
sqlrepos and open three new corresponding PRs.
- Test these changes manually (since our automatic CI doesn’t know the git references to the various PR branches): launch a cluster, SSH into each machine, checkout the correct branches of each service, re-build these components, and finally run tests.
- After testing is complete, merge each of these PRs. Note that automatic CI will fail unless all PRs are merged simultaneously.
- If a mistake was made when manually testing, then it won’t be caught until all of the PRs are merged, at which point the ecosystem will be in a broken state; roll-back the latest commit on each repo and try again.
- Modify the
Hashfunction in the
utilsdirectory as well as any calls to
opaquedirectories and open a single new PR.
- CI tests will run automatically as all changes are immediately available.
- Merge once all tests pass
As you can see, the new workflow is drastically simpler than the old one!
In this part of the blog post, we shared how we managed to centralize our entire codebase that was originally scattered across multiple repos into a monorepo. We then discussed how the monorepo simplified many of the complexities inherent in our original polyrepo structure, and lastly walked through an example to detail the differences between working with a polyrepo and working with a monorepo.
Thus far, we’ve been incredibly happy and satisfied with our monorepo — our lives (and code) have been drastically simplified.