
I’m going to try my hand at being a serious contributor to open source. This could be neat, or it could just be a one-off and fizzle. Time will tell.
I’ve contributed on-and-off to open source throughout the years – minor patches to dozens of projects, or to various repositories for work. However, I’ve never been deliberate about it. I’ve always liked writing code in my spare time, but I wouldn’t really call this “developing software”. That’s been reserved for work time.
Now that I’ve been in the industry for a decade, I’m finding myself yearning for my time spent coding to have some meaning: contribute to things other people find useful, interact with people, and build something bigger than I ever would alone.
Getting my feet wet
So the past week I opened on issue and wrote 9 pull requests across 4 repositories.
Open Location Code
- https://github.com/google/open-location-code/pull/749
- https://github.com/google/open-location-code/pull/750
- https://github.com/google/open-location-code/issues/751
- https://github.com/google/open-location-code/pull/752
Open Location Code is a multi-language library for encoding latitude/longitude locations in a format that is both human-readable and efficient for computers to search.
I’ve previously contributed some performance optimizations to the Go Encode implementation, so my first bit of work in getting back into things was contributing a performance optimization to the Decode implementation. Rather than bombard the maintainers with a single mega-optimization PR or a bunch of smaller ones at once, I’m going to do those piecemeal. So I really have about 3-4 lined up after that one, but I’ll wait for that first one to get merged to start sending the others.
The other two PRs (and issue) are addressing things I found while doing optimizations – making logic consistent across the library’s eleven(!) implementations and adding Rust to be covered in the testing instructions.
go-cmp
- https://github.com/google/go-cmp/pull/389
- https://github.com/google/go-cmp/pull/390
- https://github.com/google/go-cmp/pull/391
go-cmp is a test library for Go that provides nice quality-of-life functions for comparing objects for semantic equality and printing nice differences to aid in debugging.
My first contribution here was pretty minor – I noticed there were some calls to deprecated methods, so I updated them. That got merged very quickly (within a day!), so I decided to look around for other things that needed doing.
I found two TODOs that I could easily resolve, so I quickly wrote and sent two more PRs. They were approved, but confusingly haven’t been merged. This isn’t uncommon – much of open source maintenance is voluntary, so there’s no expectation of a fast turnaround. It’s plausible since I now have “contributor” next to my name the reviewer may have though I have merging rights. I’ll ping in a week if there’s no movement.
go-licenses
go-licenses is a license checker for Go repositories. Since open source projects can have just about any license, it’s important to make sure you only depend on projects that have licenses you’re fine with. In particular, many companies like Google use this to ensure their projects don’t depend on unapproved licenses that may cause intellectual property problems.
In looking through the library I noticed one of the subcommands of the CLI uses os.Open to check if a file exists. This potentially creates a file handler that then needs to be closed. Instead, os.Stat is better as it can be used to check for a file’s existence without opening it.
This PR is a bit of a stretch – it’s a pretty minor improvement, and following the repo’s test patterns made it pretty big for what it is. I’m curious what the response will be.
keep-sorted
keep-sorted is another of those tools used by a huge number of projects. It’s used across the Google and open source ecosystems to do a fairly simple – but important – task: keep lists of things sorted. Scanning a list of tens, hundreds, or (yes) thousands of items to see if something exists, or to see the changes between two lists, is difficult for humans looking at unsorted lists. Consider how much easier it is to tell which letter of the alphabet is missing from a list of letters if they are randomized versus if they are in order:
- XWKGZUYJVMSQTREFPBDIALONC
- ABCDEFGIJKLMNOPQRSTUVWXYZ
For this repo, I looked for an open issue that seemed to have maintainer support and found https://github.com/google/keep-sorted/issues/80. Basically, someone wanted to know if the feature that skips lines at the start of a list could be extended to also support skipping lines at the end. A maintainer expressed support for the idea, but hasn’t implemented it yet. So, I made a quick change that added that feature. I only did this one today and I see the project has been updated fairly recently – I’m hopeful this one will be reviewed pretty soon.
Reflecting on “Successful Repositories”
In my time within academia, part some of the work I studied was on understanding open source. I think there’s a lot that academics could learn by seriously interacting with the open source community. The past week I’ve been thinking about how the work I read measured things like “success”. I’m going to avoid citing particular papers – if I ever do something like that it’ll be in a more formal work.
I consider all four repositories I worked on to be extremely successful, but they “fail” at quite a few of the success metrics used in academic works.
A common metric for measuring success is the number of recent commits/contributors. This is usually measured as the number of times code has been merged into the main repository, or the number of unique people who have made code contributions. On the surface – this seems to make sense: shouldn’t “successful” repositories constantly be having lots of code sent to them, and by lots of different people? I’d consider go-cmp to be a hugely successful repository, but it doesn’t even have 200 commits over its almost 10 year life! The thing it’s doing – showing the difference between objects – is simple, elegant, and hugely useful. It’s used extensively. The tight scope of what the repository is trying to do (and the low reliance on volatile dependencies) means it simply rarely needs changes.
Instead, I think “success” is unique to what a repository is trying to do. Something like go-licenses is pretty simple and doing a useful thing well, and so it sees a lot of use. It’s often not even a code dependency (one where it is called as a library) – but a workflow one (it is used as a tool), messing up another common success metric! Conversely, repositories like some of the ones for Kubernetes see thousands of commits per year, and they would be unsuccessful without that. Kubernetes is a huge infrastructure project that has to integrate with an enormous number of other pieces of software and do some really complicated tasks well enough for people to trust to billions of dollars of infrastructure. And, most projects that depend on Kubernetes don’t directly rely on its code – they rely on machines running its code for them.
There is a notion of use here (all of the projects I’ve mentioned above are used by thousands of projects), but also one of self-determination. A project’s goal can tell us a lot about what it is trying to do, and whether it is achieving that. Projects differ in how they are designed to be used.
I have more thoughts (What makes a “healthy” repository? What things make a repository attractive to a newcomer? How do you get started in open source?), but they don’t fit cleanly here. I’ll have to cover them another time.
Next Time?
This was a fun week, and I’ve got the next week off from work for my move, so I’ll probably spend some of my free time doing more hacking on open source. We’ll see.

























