What is a monorepo? What is a polyrepo? To answer these questions, let’s look at the modules that make up a typical application. An application might have many components: a couple of mobile client apps, a web application front end, one or more back-end services, a database and data management layers, and specialized services (such as reporting and administration).
Each of these components has source code associated with it. This source code, accessible to all of the developers responsible for the component, is stored in a code repository. There are many different code repositories out there, and many different ways to manage those repositories. Most of these repositories are based on an open source system known as Git.
So each component has a code repository. The question is, should all of the code for all of the components of the application be stored in a single, companywide or application-wide code repository? Or should each component have its own, individual code repository? Until recently, each component of an application typically would have its own repository. This model is called polyrepo because it involves an application having multiple, independent code repositories.
However, in recent years, some companies, most notably Google, have advocated putting all of the code for all of the components of the application into one large code repository. This code management model is called a monorepo.
There are advantages and disadvantages to both the polyrepo and the monorepo methodologies, and much has been written on this topic. Which one should you use?
In my mind, the traditional polyrepo model is by far superior. The monorepo model encourages many bad habits and bad processes, and it makes application scaling—at least the scaling of the development organizations and the complexity of the application itself—substantially more challenging. Here I’ll offer three big reasons.
Reason #1. Monorepos go against single-team ownership principles
I’m a firm believer in single-team service ownership. I believe that ownership for a service, system, module, or component should belong to a single development team. That team should be responsible for all aspects of that component—designing it, writing it, testing it, deploying it, operating it, supporting it, and fixing it.
This model includes owning all changes to the source code for that component. This doesn’t mean only people on this team are allowed to make changes to the source code for that component—rather, it means all changes made to the component are the responsibility of the owning team. The owning team must have the right of review and approval for all changes. Because they are ultimately responsible for the component, they must be able to manage the code for the component.
This is incredibly hard to enforce when all of the source code for all components is contained in the same repository. If your source code and the source code of your sibling teams are all in the same repository, with the same access permissions, check-in procedure, and approval capabilities, then it will be very hard for you to maintain the ownership and integrity of your component’s source code.
In a polyrepo model, each system component or service that you and your team own has its own distinct repository. As the owner of the component, you own and manage the repository. You decide who can make changes and who cannot. You decide who can accept and approve reviews of code and who cannot. Your ownership of the repository is an important part of the overall ownership required in the single-team service ownership model.
Reason #2. Monorepos encourage bad practices involving massive refactoring
You’ll hear that one of the advantages of monorepos is that they make it easier to refactor extremely large sections of code in a change request. This makes big jobs like changing an internal API entry point significantly easier. This is because you can update the endpoint, and all the calls to the endpoint, in one giant request.
However, I consider massive changes like this to be bad practice. You should not perform massive refactorings that cross team boundaries in a single giant update. For large projects, such a request usually requires a massive amount of coordination among many different development teams. Often, the change requires several teams to coordinate their release and deployment schedules to make sure the changes all go out at once. This makes individual service deployments nearly impossible to manage.
Instead, if you need to make a sweeping change, like changing the definition of an internal API entry point, you should use a multi-step process instead. For example:
- Add support for the new entry point, and support the new entry point along with the old entry point. Change the version of the API to reflect this change. Deploy the new version of the entry point and operate both entry points for the time being.
- Communicate to all impacted teams that a new entry point has been published, and that it should be used from now on. Mark the old entry point as “obsolete,” and communicate the scheduled date at which the old entry point will be removed.
- Coordinate with the impacted teams to make sure they implement the change as required to support the new entry point, within the desired schedule. Make sure all teams deploy the changes to use the new entry point. Each team should deploy their support independently.
- Once every team has deployed their updated support, and no one is using the old entry point, you can remove the old entry point from service.
This is obviously a longer and more involved process than making a single large change to a repository. But think about what you are trying to accomplish. You are changing how an internal entry point functions, and the impact of that change will be felt by many different teams across your entire organization. This sort of change should not be fast and easy to implement—it should proceed slowly, with care.
The ability to make such a sweeping change should require coordinating with each impacted team. Making it easy to make a large and hugely impactful change such as this is not a good practice. Shortcutting a full change review process could result in unforeseen and potentially long-term problems, as well as degrade inter-team communications and trust.
Don’t use “it’s easier” as the excuse. Some things should not be easy, nor fast, given their impact. A multi-step change process will take longer, as it involves more thoughtful and larger-scale evaluation that should not be rushed. Polyrepos can help you mandate this extra care and evaluation. Monorepos allow shortcutting these considerations in ways that are destructive to your organization as a whole.
Reason #3: Small repositories are better than large ones
Large applications have large repositories. If a large application has hundreds of components, and all of those components are in the same code repository, that repository will be huge.
Google’s monorepo is gigantic. Its single repository holds all of the primary Google code, and contains more than two billion lines of code, which is more than 85 terabytes. This is 40 times the size of the entire Microsoft Windows operating system.
The larger the repository, the harder it is for each individual engineer to manage the repository while trying to develop code for inclusion in the repository. The more people you have working on a single repository, and the more code changes that repository sees, the more maintenance each individual using that repository must deal with.
In Google’s case, more than 45,000 changes are made to its monorepo every day. This code management becomes an exponential problem in overhead as the number of developers of an application grows, and the number of components within the application expands.
With a polyrepo application, each component has its own repository, and the number of people who need to work with the repo on a daily basis is small and manageable. Rather than thousands of developers managing thousands of changes while working in a monorepo every day, a typical polyrepo team repository will be managed by five to 10 people, managing perhaps a hundred changes per day in a well-constructed team architecture.
In Google’s case, the challenge of scaling its monorepo is so great that Google had to invent a new tool to manage the repository. It moved away from a traditional Git-based system and built a tool named Piper. This tool is specifically designed for handling extraordinarily large code bases. If Google had used a polyrepo model, this would not be necessary, because each repository in a polyrepo would be small enough to make using traditional, mainstream code management tools, including Git, work just fine. No specialized tooling would be needed.
Monorepo or polyrepo: Which is right for you?
There is no right or wrong answer here. There are strong advocates for both views, and there are advantages and disadvantages associated with each approach. Google, in particular, believes its monorepo model is perfect, and it has invested heavily in building a set of tools and processes that work for Google.
However, the disadvantages of monorepos, in a well constructed development environment, far outweigh the advantages. Polyrepos offer a better, longer-lasting, and more scalable environment for your scalable, modern application architecture.