Mitch's Blog

Go Modules: Why You Should Stop Worrying about Vendoring

Last modified August 04, 2024

Go Modules have been out since Go 1.11 as the official method for managing dependencies in a Go application.

With it came a handful of improvements that makes life nicer for a developer:

That last point is the one that I’ve seen generate some controversy, and the focus of this post!

How Do Go Module Dependencies Work?

A typical workflow for adding a dependency to your Go application could look something like this:

And you’re good to go! Run go run main.go to see the results.

What’s vendoring got to do with all this?

Vendoring is a way to pull in all dependency source code into a local vendor directory that lives in the same repo space as the application using them.

Then, Go build tooling can access the sources directly when compiling without needing to access $GOPATH or pull anything over the network.

Now this isn’t a new thing or especially novel. From what I know about some other languages, they have a similar way to pull dependencies to work with locally:

The difference (as far as I can tell) between Go vendoring and most other languages, is that you’re encouraged to commit the Go vendor/ directory into source control.

Criticism and Defense of Vendoring in Go

Vendoring in any language has its share of opponents and supporters:

Here are the common arguments I’ve seen against using and committing the vendor/ directory to source control for Go projects, and my counter-arguments to them.

storing vendored dependencies is a waste of space

While it’s true that you need to use up some space in duplicating code that already exists elsewhere, I argue that it’s not a waste since you get some gains elsewhere by vendoring, along with the real amount of space being duplicated ends up being pretty small in practice.

Even a stupendously large and complex project like Kubernetes has a vendor/ that clocks in at only 114MB.

Nomad, a “workload orchestrator to deploy and manage” containers and other applications, has 79MB of vendored dependencies.

Generally, a Go project will have a vendor/ directory in the single to low double digits. Caddy, a full-featured web server and reverse proxy (among other things), has only 37MB of dependencies.

And storage is cheap! Especially compared to bandwidth, which is what we’re trading off here when vendoring. The Python Package Index would cost $1.8 million a month if Fastly didn’t give it to them for free.

Doing a very unscientific look at the AWS S3 service pricing, storage is $0.023 per GB per month vs. $0.09 per GB for bandwidth (after the first GB). They both offer discounts for more storage and bandwidth used, but it never breaks even. At the most-discounted end, bandwidth is still 4x more expensive!

The counter to that argument is “well your source control provider is still paying for all that extra bandwidth, and it’s duplicated by N repos that pull in that dependency.”

For that argument, I don’t believe the cost shift is 1:1. If you run your code through some continuous integration system like Github Actions, Drone, etc. and you’re not using vendored dependencies, then you’re pulling all of those every time the build runs and costing the source of the dependency money each time.

Often, these need to be supported by free hardware and bandwidth, donations, corporate backing, or some other less-than-ideal form of funding. Or, they piggyback on an existing system like using git repositories as the source.

If you’re using an online source control system like Github, Gitlab, or Sourcehut and using the same platform for storing source and running builds, it’s likely they have it set up so bandwidth is free between their resources so it’s actually costing everyone less if ingress traffic isn’t free. At best, all the language package repositories have agreements with cloud services to provide no-cost transfer between them, so it’s a wash here in that case.

If you’re really concerned with costing your git hosting service a little extra money, all the ones I have seen offer a subscription or some way to donate. Or you can self-host something like gitea on one of the many cloud providers for a couple bucks a month.

people will make local changes to vendored dependencies and never commit their changes upstream

The argument here is that some developer finds a bug or wants an enhancement to some library their using. Instead of opening a bug report and working with the library developer to get a fix made, they’ll vim vendor/github.com/someUser/coolThing/foo.go and make the desired change there and that’s the end of it. Upstream doesn’t see fixes, the application is now using a nonstandard “fork” of this library, everybody loses right?

While it’s true that you’re essentially relying on people to do the “right thing” and not make local modifications to vendored files, I don’t believe this is a problem in reality.

The changes only stick around until go mod vendor is run again and replaces the modified version with the upstream version. Either a person would need to come up with some patching maneuver every time a dependency gets added or upgraded (when go mod vendor would get run afterwards), or would have to have a fairly unchanging codebase.

All in all, if people really want to use a customized version of some library, they can still make a fork of it and include it using the go.mod replace directive, and as far as I’m aware many other languages have the ability for similar substitions. It’s hardly a Go or vendoring specific problem. And besides, sometimes you really do need to make a quick fix and test/deploy it while you wait for a response from upstream, and vendoring makes that a slightly better experience to deal with.

vendored files clutter up pull request diffs

This one is true. There’s not a whole lot you can do to get around the cosmetic problem with diffs that include vendored files. In practice it hasn’t been an issue for people I work with on the same Go project. It can also be a good opportunity to review what you’re pulling in, and make you think twice if you see a particularly bad snippet of code or that some throwaway package (to you) is in fact pulling in thousands of lines of its own dependencies. Something I don’t think about as much as I should when not vendoring, and I’d guess other people have a similar lack of awareness sometimes.

Benefits of Vendoring in Go

With that last section out of the way, here are a couple of good reasons why vendoring can actually help make your life, and your builds, better.

vendoring allows for hermetic builds

What’s the tl;dr about that anyways? Essentially, this means your build is self-contained and doesn’t depend on anything outside of its own repo. Two people should be able to pull the same commit from some source control repository, run go build, and get the exact same output.

Now, isn’t this the whole point of dependency lockfiles? Sort of.

You’ll still be depending on network access to pull in dependencies from some package repository, which means you could be susceptible to another left-pad disaster if the repository is down due to maintenance, censorship, or other unforeseen circumstances.

Google has a public Go module proxy cache to help prevent breakage in case someone removes their code, but you’re still dependent on that being up and running and hoping that Google will keep it running far into the future whenever you try to pull your dependencies again.

Vendoring your dependencies means no matter, you can always build your application as long as you have a compatible Go compiler installed. Need to git checkout and build an older version? Your old dependencies are stored right along with it!

vendoring avoids any anonymous pull issues

This one probably will only bite you if you’re trying to include dependencies from a private source control repository. The issue is, normally Go uses your regular git credentials for pulling dependencies, or from the module cache if available. With private repos that live on GitHub Enterprise for example, there’s no cache available so you have to use your credentials since “anonymous pulls” aren’t allowed.

This is fine on your personal machine, but becomes an issue when you’re trying to build your application in a Dockerfile or some continuous integration run. Now, you’d have to somehow include a token that has access to pull the private repo. Possibly by storing some secrets and writing them into .gitconfig or .netrc, but either way it’s a pain.

With vendoring, that issue goes away since everything is built locally!

vendoring keeps your code close to your dependencies

You might be saying, “Well Duh! That’s the point!” to this one, but hear me out. With all the source code of your dependencies just a quick cd away, it makes it much easier to jump in and check out code to get a better understanding of it or trace a path of calls. Locally, IDEs like Visual Studio Code have support for jumping into the source files in vendor/ to view functions and types, and can offer autocompletion based on the files scanned in there.

On the web, I find it easier to jump around directories in a single repo than hunting down the individual repositories for dependencies (if they even exist anymore) and once there, they’re often filled with code I don’t care about, whereas vendoring only pulls in what’s necessary.

The End

Thanks for reading! I think it’s clear that I’m a supporter of vendoring your Go dependencies, but what does everyone else think? Feel free to send an email to the mailing list and let me know your thoughts on this post (is it stupid? brilliant? extra stupid?) or anything else!


Tags