Git commits and how to craft them

Git isn’t the easiest of software, so many people treat it as a necessary evil and use it as the bare minimum they can get away with. But if you treat it right (and practice a little), git can be your friend.

In the last couple of years I’ve taught git to students and coached colleagues on git. I got exposed to several different workflows on a variety of git forges and was involved in hammering out the workflow for a couple of projects. I was also fortunate enough to work with some people who know almost everything about git. Naturally, all these also involved a lot of discussions and thinking about git, which made me realize, that a little bit of effort can go a long way, when dealing with commits.

Below you will find the synthesis of these discussions and experiences: a set of rules, their rationale and a short practical guide to creating useful commits.

The rules

The reasons behind these rules are a combination of a) fully exploiting git tooling, b) a courtesy to your future self, current and future colleagues, and c) consistency. Of course they only apply to the final, finished commits, but more on this later.

The commit title should be short, ideally around 50 characters, but definitely less than 72.
Use a short prefix in the commit title that places the commit’s scope within the wider project (e.g.: ci:, ui:, train:, doc:).
The commit body should be wrapped at 72 chars.
The commit body should explain why the commit is needed, in as much detail as necessary (picture writing it for a non-senior, recently onboarded colleague with no detailed knowledge about the code).
The commit body should make use of relevant commit-trailers.
Each commit should be self-contained, changing one well-scoped part of the code (called an atomic commit).
Each commit should produce working code, even if you are working on a chain of them.
The commit (title and body) should be written in imperative, as if you were instructing git on what to do (“Fix bug” instead of “Fixed bug”).
You should usually avoid using merge commits. Instead, you should always rebase first and then fast-forward or apply patches.

Rationales

Make git log readable

On my laptop, a split screen terminal is around 90 characters wide. Running git log --oneline will prefix the title with 8 extra characters (7 characters of hash and a space) and if the first commit is actually in sync with the origin it’ll take 53 characters. If you want to add the date and the commit author as well to the line, that’s even less space. Thus it makes sense to make your title short and the body hard wrapped.

git log –oneline

This is of course not just about your command line. Since short titles and hard wrapped commit bodies have pretty much always been the norm, most git forges (github, gitlab, bitbucket etc.) will also wrap commit titles after 72 characters, making longer commit titles particularly unreadable online.

Make git blame useful and code review easier

Git blame shows which commit modified each line of a file last. It can be incredibly useful when you’re staring at a piece of code and you have no idea why it was written like that. You check the commit with git blame and realize that oh, that strange condition was added two years ago to work around a legacy system that is not around anymore.

git blame

Obviously, this only works if the commit actually explains why the commit was added (I really like this story about a good commit message). Moreover, if this workaroud was added in a well scoped commit (instead of five successive commits with only the last one actually getting it to work), then dropping the extra complexity of the now redundant workaround is trivial with git revert.

Having commits that are explained in detail and are well scoped also help out with reviewing them, compared to trying to make sense of several unrelated changes without any explanation.

Make git bisect work

You are in a situation where somewhere along the line during development a bug was introduced into an already implemented feature and you are not sure where. Using git bisect you can mark the current commit as “bad”, mark a commit in the history where the feature was still working as “good”, and git will help you quickly narrow down the commit that introduced the bug. It will walk you through a binary search of the history between the “good” and the “bad” commit, to find the first commit after the known “good” one where the feature was broken. You either test each offered commit manually, or you can automate the process with a script.

Bisecting is incredibly useful, and can save a lot of time when hunting bugs, but only if the commit history is in good shape (for tutorials, see here). Bisecting when there are commits where the code is just plain not working is hard and easy to get wrong, because you can’t test your specific feature if the entire software is broken. Automated bisecting is probably out of the question.

Of course, if bisect is successful, but the commit at fault is not well scoped and lacks explanation about the changes you still might be in trouble. Figuring out how to fix the bug will be harder and reverting the commit is likely off the table.

On avoiding merge commits

First, avoiding merge commits has a readability aspect to it. In the most extreme case of having a merge commit after every proper commit the merge commits carry zero information, but take half the screen real-estate. This is true for less extreme cases as well: as long as the proper commits are well crafted the merge commits will carry little to no information even after long stretches of commits.

Second, merge commits make keeping two branches in sync irritating. Any time commits from a source branch are merged with a merge commit to a target branch, the source branch will need to retrieve the merge commit to keep in sync.

I’d argue that merge commits only make sense for very long lived branches with a large number of commits. For example branches being worked on for several months with 50+ commits, but please don’t quote me on the exact numbers.

If you still want to have merge commits for some reason at least rebase before merging.

Miscellaneous

We’ve already touched on well scoped commits making git revert much more useful. This is of course also true for git cherry-pick. If you are in a situation where you need to maintain slightly different versions of your program in parallel git cherry-pick is a great tool, but only if commits really make sense on their own.

The usage of commit trailers (when relevant) is just good practice, since git has tooling around them (see git interpret-trailers). The most common ones are widely used in automation or understood as having a specific meaning. The most famous trailer is probably Signed-off-by:, you’ve likely also seen e.g. Fixes: for automatically closing issues, but things like Co-authored-by: for acknowledging help or Link: for adding external links can also be useful.

The last thing I haven’t touched upon is why write commits in imperative mood? Well, this is mostly just aesthetics and the core of the rule is actually to write commits consistently with the rest of the project. That said, most projects do not have rules for this, but when they have, I have only yet seen ones where imperative was required. Git itself also uses imperative in automatic commits (e.g. git revert creates titles like revert "bad commit"). Finally, to me it feels more natural to read imperative during a review, then say past tense (the proposed change has clearly not yet been applied, right?). I always picture instructing git on what to do with the codebase when the commit is being applied.

Comments on external constraints

Working with pull/merge request based forges

Most Git forges have workflows based on pull/merge requests (think e.g. Github, Gitlab, Bitbucket). The conceptual problem I have with these is that their main focus is on handling differences between branches instead of individual commits. Their default views treat all the commits as one big blob of changes making it hard to review commit by commit. It’s usually not even possible to directly comment on the commit messages themselves, only on the code changes (i.e. most of them support commenting on a specific line of code, but not the commit title or body). This promotes not paying much attention to an individual commit, but you still can and should.

Another thing these forges incentivize are merge commits. They offer a setting for merge/pull requests called the merge strategy and the default setting is always merging with a merge commit. Fortunately, this can be changed to fast-forwarding. In this case, instead of merging, the commits from the source branch are replayed on top of the target branch, creating a nice linear history (of course for this you first need to make sure the source branch is rebased on top of the target branch). If you really can’t live without merge commits, an alternative option is having a semi-linear history with a rebase-and-merge strategy.

An interesting option for merge strategies is to squash the entire pull/merge request into a single commit, before merging or fast-forwarding. This may actually be a valid strategy, but that means that now this squashed pull/merge request is your commit, thus everything we have talked about above should now apply to this single resultant commit. I think it’s usually more viable to just disable this option and do any necessary squashing manually.

For a more detailed and visual explanation of the merge strategies, see e.g. here.

As a quick side note: If you didn’t even know that pull/merge requests is actually not the only way to collaborate using git, you might find this an interesting read and I highly recommend that you try out the interactive git over email tutorial (also check out aerc, which, in my obviously totally unbiased opinion, is the best email client for git over email). I found that the different perspective of git over email greatly helped my understanding of how git works, so even if you will never use it in practice, it’s worth to do the tutorial at least. Also sometimes it’s just easier to send someone a patch in email rather than to set up a branch for the commit, push that branch somewhere and then email them the link.

Working with conventional commits

The core idea of conventional commits is to write commits, so that semantic versioning and changelogs could be automated. My problem with this is that versioning and changelogs are user facing documentation, while the git history is a developer facing documentation. This is like when corporate requires you to create “predocuments”, those powerpoint presentations that must also serve as a report/document, making you end up with a wretched abomination that is good for neither. If you want an automatic changelog I suggest to document changelog entries within the commit, using git trailers (this is how aerc handles this). Of course this is more work, so if you’re going for automatic versioning and changelogs with the least effort, I won’t blame you for going with conventional commits, but otherwise, I don’t think it makes much sense. The breaking change trailer and ! in the title for breaking changes is also a nice idea.

If you do use conventional commits, you can still stick to most of our rules: just make the optional scope (which is like our prefix) and the optional message body required instead.

In practice

The size (and number) of commits does not matter

You may have noticed, that the rules do not include anything about the size of commits, or the number of commits. Indeed, the requirement that each commit be well scoped and that each commit should lead to working software may lead to situations which you might find surprising at first.

It is entirely possible that a commit is only a single letter change, even if in the next commit, you change the same file. A practical example: you introduce a new feature to your program, that requires adding several new parameters to a configuration file and completely unrelated to this new feature you need realize you need to change an already existing parameter in this configuration file. These changes should be in two separate commits, the first explaining why the feature is being added, the second explaining why the existing parameter must change.

A patch series (pull request) may have multiple commits touching the same line. Again a practical example: your project uses a templating language, which is already used in several places, but you realize that to implement the new feature that you want, you need to change to a more powerful templating language. This should be done in two commits, first switching the entire codebase to the new language without touching the existing features and a second commit to actually add the new feature. Inevitable, both commits will touch the lines where the new feature was actually required.

Staging hunks instead of files

Getting started tutorials will teach you how to stage files with git add. But it is possible to only stage some changes in a file using git add -p (see here on how to use this) or even more granular with git add -e. Although these commands can be directly used, I find that this is where a good visual editor integration is the most helpful.

Staging hunks instead of files allows you to split your changes into those well scoped commits.

Actively rewrite history

As long as the branch/repository you are pushing to is not one that is consumed by users (e.g. it is a feature branch and not the dev or master branch) it is absolutely okay to force push. Also, be not afraid! The git reflog will have all your commits for a long time, even if they are currently unreachable from your branch. If you diligently commit, nothing will be lost, even if something goes wrong.

When working on a feature that can be done in a single commit, the best strategy is to commit early and then constantly edit this commit as you make changes. To add your changes to the latest commit instead of a new one, you git add as usual, and then commit with git commit --amend or git commit --amend --no-edit if the commit message is fine. This is actually less work than adding a new commit, if you already have a good commit message and everything is already tidy.

If you need to split the latest commit you can do the reverse of git add -p with git reset -p. Alternatively you can run git reset HEAD^, which will reset git to the previous commit, but leave your changes in the working directory (git reset --soft HEAD^ is also an option). The ^ means the parent commit, so the one before HEAD, and HEAD is a reference to the current commit (see here in detail how to reference specific commits). If you already had a nice commit message, then resetting will drop it, but you can still retrieve it by using git commit --reuse-message=ORIG_HEAD, or using any other way to reference a specific commit instead of ORIG_HEAD.

If you’d rather just first use a lot of git commit -m "wip" to save your work for later, you could also git reset origin/master (assuming you branched off from master) and go about staging hunks into an appropriate number of commits in one go. If origin/master has actually progressed away from where you branched off from, then resetting to the first common ancestor can done by running git reset `git merge-base origin/main HEAD` : the expression within the backtick will evaluate to the first common ancestor between master and your current branch, i.e. the commit you originally branched off from.

A more complicated situation is when you need to edit commits that are not the last one. If all hell breaks loose you can always just reset to the root of your branch like above, but usually there is an easier option with git rebase. If you want to change commit 6b8ed1e0 that is a couple of commits back, you can create a fixup commit by staging the necessary changes and running git commit --fixup=6b8ed1e0. This adds a new special commit, so that if you run git rebase --autosquash (usually something like git rebase --autosquash origin/master), git will know to squash this new commit into 6b8ed1e0. This --fixup has a couple of variants and there is also --squash, see man git-commit.

The most versatile option is using git rebase --interactive. This allows you to do all of the above, and more, e.g. edit a specific commit during rebase.

For more details I recommend this excellent tutorial on rebasing.

Some of the above in practice as a not-so-smooth asciinema cast:

Recommended settings

These settings will make life easier (see my git config):

git config --global rebase.autoSquash true # autosquash by default in interactive rebase
git config --global rebase.autoStash true # stash and reapply unsaved changes during rebase
git config --global pull.rebase true # do a rebase when pulling from remote
git config --global rerere.enabled true # remember resolved merged conflicts

See this about rerere in detail. It’s also worth to note that you can create short aliases in git for long commands you use often.

The documentation for most available options is in man git-config or in the specific tool’s man page.

Tooling

Set up your shell so git commands, especially ones that take commit references have good autocomplete, e.g. with fzf search (I use this on zsh).

Set up your editor, so it’s easy to stage hunks, do amends, rebase to specific commits etc. Unfortunately, I can only make recommendations for vim, but I’m quite sure that most editors have at least decent git support. For vim I use fugitive, vim-flog which is basically a fugitive extension for viewing and interacting with the git log, and vim-gitgutter for visually staging hunks (see my config).

I also find lazygit to be useful and is definitely worth to check out.

How seriously should I take this?

That depends. Chances are that even the most serious projects will start with git commit -m "initial commit". The longer your project is likely to be around, the more people are likely to interact with it in the future, and the more collaborators you currently have on the project, the more you should take it seriously. And of course, not all parts are equally important or applicable to every particular situation, especially with external constraints like already existing tooling and conventions in your team/company.

Personally, practising the above has saved me hours of debugging on even smaller projects, so I try to be diligent from the start, but your mileage may vary. At first it will probably be harder than just churning out commits, but with some practice the overhead quickly becomes minimal. Not to mention, that since basically every project and company uses git, it’s a pretty directly transferable skill, unlike that n+1^th framework you learnt yesterday.

References

Tim Pope on commit messages
linux preferred commit style
aerc contributing instructions and aerc history as an example
and of course if you want a deeper understanding, you should read the git pro book

Acknowledgments

Thanks to Koni Marti, Robin Jarry and István Papucsek for reading the first draft.