Show HN: "Git who" – A new CLI tool for industrial-scale Git blaming

252 points by weebst 4 months ago

I've always wanted a better way to explore the authorship data embedded in a Git commit log. I'm having fun building a CLI tool to do this.

It's a bit like the "Contributors" tab on Github that shows you how many commits each contributor has made but much faster and with many more options.

If you get a chance to try it out, please let me know. I'd love to hear feedback and suggestions. Thank you!

m000 4 months ago

Lovely tool. Thanks for the release! I had some fun with it before calling it a day.

Here is my personal wishlist after a short test-drive.

- Blame-based stats. While it is nice to see an overview of the historical contributions of Bob and Alice, this is not something that I would use on a daily basis. What would be more useful, is to present the same tables based on the blame lines of a tree-ish. This would show the de-facto "owner(s)" of modules/files, something which comes handy when asking for help with something or even assigning reviews. One could also run this iteratively over the history and get some nice timeline graph.

- Support for pattern-based inclusions/exclusions. E.g. I am not interested to see stats on the json files used by tests. Or any kind of auto-generated files (e.g. django migrations).

- Support for a configuration file, to store your preferred settings in your git repo. Something TOML-based perhaps.

- Better packaging (nit). E.g. the linux tarball for v0.6 contains some apple-related "junk" and gnu tar complains about archive format incompatibilities.

weebst 4 months ago

I'm very happy to hear you had fun with it. Thank you for the comprehensive feedback and for trying it out!
Not sure what's happening with the tarball. Will take a look at that.

dspillett 4 months ago

One thing to note, maybe particularly with reference to the “who wrote vim” analysis⁰, is that depending on workflow this can attribute more to internal contributors than you might assume by the results.

If a patch or pull request is sent but an internal contributor (the only internal contributor in Vim's case for a long time) reformatted it before merging then the same work is double-counted (if full history is kept) or only attributed to the reformatter (if history is squashed during/before merge).

This doesn't make the result wrong, of course, the tool is doing exactly what it says on the tin. But it does mean that, without reviewing the contribution process (current and past), you might need to be less definite¹, when stating a meaning derived from the result, because how the result is interpreted might not be quite right given the input data available.

----

[0] https://sinclairtarget.com/blog/2025/03/who-will-maintain-vi... in case you skipped by the link when looking at git-who's readme.

[1] perhaps just by giving caveats to make sure that the reader has sufficient context regarding the limit of the process

lionkor 4 months ago

By the way, git blaming is really misunderstood by a lot of people; its NOT about who did it, its about which commit is to blame -- that's different.

ants_everywhere 4 months ago

I'm pretty sure it's about who wrote the code.
'git blame' is named after the subversion and CVS blame features that do the same thing. Subversion docs are clear that it's a snarky name and that 'svn praise' and 'svn annotate' are neutral synonyms.
Perhaps someone familiar with CVS can comment on its history there since it seems to be the first source control to add it.
EDIT: and one of the main reasons it's a useful feature is it tells you who to talk to to understand a piece of code, or to coordinate a roll back, or to do any other sort of communication. It probably matters more in a big company where code is changing frequently and you're unlikely to know everyone and what they're working on.
- Minor49er 4 months ago
  
  > Subversion docs are clear that it's a snarky name and that 'svn praise' and 'svn annotate' are neutral synonyms.
  A few years ago, some Atlassian developer changed "Blame" in the BitBucket UI to "Annotate". I remember a lot of people being frustrated because they couldn't find "blame" anywhere and the change was never officially announced. It just happened one day
  Someone opened a ticket with BitBucket about it which ended up drawing a lot of attention from frustrated users who couldn't find "blame", and their searches for it on Google led people to the ticket. Atlassian eventually responded saying that they made the change because "blame" sounds bad and can hurt people's feelings somehow (with no examples given of course, though ironically the dev who made the change certainly had hurt feelings after the upset masses had some choice words for the short-sighted decision. Though Atlassian doubled down and I believe closed the ticket without reverting the change, so the confusion remains, as far as I know)
  I don't think that they ever mentioned the Subversion/CVS parallel that was drawn to choose that name, so it was really confusing why that was selected. But this comment shed some light on that ancient incident
  
  anilakar 4 months ago
  
  If I ever saw an "annotate" command I'd immediately assume it's for adding notes as metadata outside the actual software versioning tool, not for seeing who wrote the code in question.
  Nomenclature matters. Do not reinvent terms just for fun.
  
  justusthane 4 months ago
  
  Agreed - I don't understand at all how the word "annotate" is being used here. It seems like "substitute" would be a better standin - as in "To whom can I attribute this code?"
  
  awesome_dude 4 months ago
  
  > ironically the dev who made the change certainly had hurt feelings after the upset masses had some choice words for the short-sighted decision.
  Dev probably became the public face for a decision made by someone else (eg. Product owner, TL, whatever the business structure is in Atlassian)
  
  ge96 4 months ago
  
  I'm not in disagreement about being able to tell who wrote some piece of code, I like gitlens in my vscode for ex.
  The feelings hurt thing is real, unfortunately for myself I am that person that gets butthurt but it's a phrasing thing, "why did you do this?" vs. something more neutral sounding like "hey this has this side effect are you aware".
  Anyway unfortunately in my case too we're not allowed to write tests so it really is an exercise in omniscience.
  
  Minor49er 4 months ago
  
  I'm curious if you have experienced people asking something as blunt and short-sighted as "why did you do this?" as the result of a blame. The blame should reveareveal the PR or commit that a change came from which should answer that question already
  You should also write tests. They ensure that your code works as intended. Some teammates might not understand that untested code can cause more development time since broken features will have to be fixed in production, so highlighting bugs that have to be fixed as well as writing tests thst cover as many cases as possible should shine some light for those still not understanding their value
  
  ge96 4 months ago
  
  That's the thing, I used to think tests are annoying but now I'm advocating for em, maybe a sign of growing up ha. Unfortunately not my call. Yeah it's just tone, tone changes the outcome of some conversation. Puts person on defensive, stops thinking, that kind of thing.
  
  Minor49er 4 months ago
  
  I mean, I get it. Priorities can also affect things. I'm in a similar situation at work where the manager is pushing to get a project out as fast as possible at the expense of testing and even basic planning. It's only serving to make things much worse because all of the short sighted decisions are causing all kinds of new problems that could have been avoided entirely
  Sometimes it's easier to adopt better practices when moving to another team or project altogether
- funcDropShadow 4 months ago
  
  Yes, the history of the name is as you write. But the value of the command lies in finding the commits, and, especially, in finding which lines changed together.
- dvt 4 months ago
  
  > EDIT: and one of the main reasons it's a useful feature is it tells you who to talk to to understand a piece of code, or to coordinate a roll back, or to do any other sort of communication. It probably matters more in a big company where code is changing frequently and you're unlikely to know everyone and what they're working on.
  It's actually a pretty awful feature because it misses so much context. I've been blamed before for changes which were technically my fault, but while my code was to spec, some unrelated part of the code I was interacting with was not (iirc it was some multi-threaded nonsense like a race condition or something).
  It was a super-stressful week of constantly having to defend my design decisions and white-boarding my thought process (think of the "am I taking crazy pills?!?" scene in Zoolander) as my senior coworker tried to gaslight and throw me under the bus.
  Maybe I've had a uniquely bad experience with it, but I've vowed to never use it (as a way to attribute `blame`). Code should be holistically understood and it's your job as a technical leader to know how the parts move, resolve issues without drama, and make sure your whole team is on the same page: this is a cohesive team, not an adversarial dick-measuring contest.
  
  cdogl 4 months ago
  
  That sounds like an awful workplace culture. I doubt the name of the git command is responsible, though.
  
  opello 4 months ago
  
  Just know that it can also be used in very positive ways.
  For instance, I may want to change some basic behavior. Easy enough, spend some time implementing and testing, and then run into a downstream consequence of that change while implementing. Now I need to make a decision. Reviewing the history of the relevant sections of code, using git blame, can help me uncover the context and ways in which the code I'm puzzling about changing has changed previously. This can be incredibly valuable and speed up or even obviate an amount of discussion around the potential change.
  
  pjc50 4 months ago
  
  This is why blameless post-mortems and a healthy culture are important. I'm sorry you encountered such a hostile culture. Perhaps "git blame" has the wrong name, but the idea of traceability is still important.
  Aviation is the shining example of this, combining high traceability (you should be able to track each part back to the factory and to all the technicians who have worked on it) with accident inquiries that are focused around finding cause and avoiding future risks rather than assigning blame.
  
  delusional 4 months ago
  
  I acknowledge that I have no idea what happened in this situation. Please don't take this as me justifying mean behavior.
  > Code should be holistically understood and it's your job as a technical leader to know how the parts move
  That's true when we design something. Once the design is done, and is broken, we have to tear it back apart to understand WHAT is broken. That's when blame is useful.
  I love using git blame. I love it even more when it comes back with something I made, because then I get to learn. When something I thought was safe turned out to break something, that's an invaluable chance to understand the system better.
  That being said, I've totally used the blame output to end a series of excuses from a junior about how "his code was definitely right, but everything else was garbage" because I really do not care. If it worked before, but doesn't work now, that's a problem. Part of the modern process of "fail fast" is also to build up taste about which parts of a working system are spooky.
  I find that some people take the "blameless" culture too far, and use it as an excuse not to reflect on outcomes. They just ruffle the code whenever there's a big, and don't think critically about why that bug appeared. What that tells us about the system we're making.
  
  hn_user82179 4 months ago
  
  I work at a company with a mess of a monorepo but the git history is a gold mine. It's fascinating digging back into history and reading why certain decisions are made, or random pitfalls the author discovered, or context that was missing. It absolutely feels like a bit of a detective mystery trying to dig back and figure out if some line of code is a bug that was meant to do something else, or is functioning as intended and the requirements changed, or something else entirely.
  Ofc as my org has gotten bigger, we've lost a lot of the discipline around writing good commit messages so now it's just a mess of large code-changes with 1-line "bugfix" explainations :(
  
  awesome_dude 4 months ago
  
  > Ofc as my org has gotten bigger, we've lost a lot of the discipline around writing good commit messages so now it's just a mess of large code-changes with 1-line "bugfix" explainations :(
  I have a battle at the moment to try and get the team I am in (5 devs) to take their git commit messages and history seriously, but the "TL" has said that he "doesn't care that much about commits/history/etc"
  That bit us right on the ass when debugging someone elses branch recently, because the state needed to fix was across three seemingly unconnected commits, so a checkout of one commit + fix then needed to be tested across two other commits.
  
  awesome_dude 4 months ago
  
  > It's actually a pretty awful feature
  Obligatory T-Shirt link
  https://www.amazon.com/Blame-Ruining-Friendships-Since-T-Shi...
gleenn 4 months ago

Isn't this sort of an inconsequential point? The commit still has one and only one author and that's almost certainly what I'm looking for so I know who to go ask questions about their code. I also use it to find the commit but less frequently.
- sshine 4 months ago
  
  No, commits can be co-authored.
  And the committer and author don’t even need to be the same!
  But the point, as I read it, is: what matters is the context, i.e. if a line is faulty, how did things look like when it wasn’t faulty? The commit’s content is more often more important than the committer, although the committer is useful because you can ask them if they’re still around.
- pjc50 4 months ago
  
  At my last workplace, the codebase was about 25 years old, there were three of us, and one of us was the original author. You could simply guess "Gerald wrote this" and you'd be right nine times out of ten. However, it turns out that software developers have finite memory themselves, and svn blame was useful in tracing a line of code back to the original ticket.
  Linking a line of code back to the commit is useful even if you can't ask the author about it. It tells you what other lines of code are involved and what the overall purpose is. It's significantly more useful if you can link it into documentation outside the code: ticketing systems, requirements docs, etc.
  The main limit to svn blame in that situation was that quite often it would hit commit 1, when the codebase had been imported from Visual SourceSafe.
- rav 4 months ago
  
  On a small team I usually already know who wrote the code I'm reading, but it's nice to see if a block of code is all from the same point in time, or if some of the lines are the result of later bugfixing. It's also useful to find the associated pull request for a block of code, to see what issues were considered in code review, to know whether something that seems odd was discussed or glossed over when the code was merged in the first place.
  
  mckn1ght 4 months ago
  
  I find the GitHub blame view indispensable for this kind of code archeology, as they also give you an easy way to traverse the history of lines of code. In blame, you can go back to the previous revision that changed the line and see the blame for that, and on and on.
  I really want to find or build a tool that can automatically traverse history this way, like git-evolve-log.
  
  eichin 4 months ago
  
  I've been carrying around a copy of "git blameall" for years - looks like https://github.com/gnddev/git-blameall is the same one - that basically does this, but keeps it all interleaved in one output (which works pretty well for archeology, especially if you're looking at "work hardened" code.)
  (Work hardening is a metalworking term where metal bent back and forth (or hammered) too much becomes brittle; an analogous effect shows up in code, where a piece of code that has been bugfixed a couple of times will probably be need more fixes; there was a published result a decade or so back about using this to focus QA efforts...)
  
  follower 4 months ago
  
  "Cregit" tool might be of interest to you, it generates token-based (rather than line-based) git "blame" annotation views: https://github.com/cregit/cregit
  Example output based on Linux kernel @ "Cregit-Linux: how code gets into the kernel": https://cregit.linuxsources.org/
  I learned of Cregit recently--just submitted it to HN after seeing multiple recent HN comments discussing issues related to line-based "blame" annotation granularity:
  "Cregit-Linux: how code gets into the kernel": https://news.ycombinator.com/item?id=43451654
  
  weaksauce 4 months ago
  
  there is https://github.com/emacsmirror/git-timemachine which is really nice if you use emacs.
- lionkor 4 months ago
  
  No, if your commits are meaningful and have plenty of context, like they should, then you are not looking for the author. Instead, you are looking for "why is this here", and the commit should tell you.
- lubutu 4 months ago
  
  I mean, no. If you work on a codebase that's been going for more than a few years, the author likely doesn't even work there anymore. The commit is the important thing.
  
  iterance 4 months ago
  
  Frankly the commit message is usually the important thing. I care about why a change happened. Give me a Jira ticket, or a line of reasoning, or some documentation. I need to know this far more often than I care who literally typed the code in the computer.
  
  globular-toast 4 months ago
  
  You also shouldn't assume the commit author is the same person who literally typed the code. Git is a version control system, not an audit trail.
wilg 4 months ago

Many people enjoy pretending that the term "blame" in "git blame" is not a funny little programmer joke that we don't have to be upset about.
- remram 4 months ago
  
  It's funny because it always ends up telling me I did it.
  
  wilg 4 months ago
  
  One of the most delightful outcomes of having the silly name!
kazinator 4 months ago

git bisect is about which commit is to blame for a reproducible problem.
git blame is about which author most recently touched each line (in what commit); i.e. is to "blame" for that line having its current content.
You're right in that git blame is most useful for finding which commit touched a line. What was done in the commit is more important than who did it.
git blame is very useful even in a solo project where you already know that you wrote every commit.
- Lanolderen 4 months ago
  
  Unless you get lazy like me and start committing only out of shame once the modified file count reaches close to triple digits or prior to doing very sketchy changes.
  
  kazinator 4 months ago
  
  Is this something you do in a brand new project whose organization, direction and overall requirements are not clear? Or persistently?
  There is such a concept as a brand new project not requiring version control until it hits a certain stage: you know it when you get there.
  
  Lanolderen 4 months ago
  
  Persistently. My solo projects aren't particularly complex though so I haven't really wasted any time by not being able to use git history for debugging. I currently have 38 files modified on my solo work project. If I'm in a team I keep it somewhat tidy but solo I only treat commits as manual save points I use only when my spidey senses tingle or when I'm about to refactor something that works fine as it is. Also when I'm done with a large part of the SW so the next dev at least has some rough timeline for what got added when and how many times it was majorly iterated. It's not a good habit but it has yet to bite me in the ass so I learn.
  Edit: A large part of the reason now that I think about it is that I don't work off real tickets but just bugs I notice or things that get mentioned on the current solo work project. In a team I can just dissect the ticket and am forced to do only that ticket on the branch whereas solo I'm just jumping all over the place. Sometimes I'll do thing X partway, start considering options and in the meantime do thing Y so it's a mess but the tasks get done so.. For context the project is 1 year old developed from 0 by me. Essentially an internal log parsing and analysis tool for a couple formats. Nothing particularly complex.
  
  dspillett 4 months ago
  
  True, and I'm guilty three too, but there is a limited amount we can expect toolmakers to do to protect those of us who misuse their tools :)
kemitchell 4 months ago

I have seen `git blame` used to blame specific people. I've seen it work. Some of those people deserved some blame.
The manpage explains what the command does. How and why it's used is up to the user.
- lelandfe 4 months ago
  
  Alternatively git praise https://github.com/ansman/git-praise
  
  what 4 months ago
  
  Why wouldn’t you just set an alias in your global git config for this?
  
  dspillett 4 months ago
  
  Perhaps the author intended further additions, perhaps transforming the output to apply filters or add spurious superlatives for humour value (“Great work on line 420, User6942!”).
  
  account42 4 months ago
  
  You can't put an alias on your CV.
  
  einpoklum 4 months ago
  
  Now _that_ is one of the best two-liners I've ever seen!
gwbas1c 4 months ago

BTW, one of the more frustrating things about "git blame" comes about when cleaning up an old codebase: In my current job I had to move a lot of files, combine repos, reformat code, ect, ect.
"git blame" and similar tools often always show my name, even though I didn't write the code.
- alisonatwork 4 months ago
  
  Most places I worked have a blame.ignoreRevsFile[0] somewhere on the top level to inhibit this. It's a bit awkward because first you need to commit, then you need commit again to update the commit hash in the ignore revs file, but it's great for filtering out pure refactor churn.
  [0] https://git-scm.com/docs/git-blame#Documentation/git-blame.t...
- follower 4 months ago
  
  "Cregit" tool might be of interest to you, it generates token-based (rather than line-based) git "blame" annotation views: https://github.com/cregit/cregit
  Example output based on Linux kernel @ "Cregit-Linux: how code gets into the kernel": https://cregit.linuxsources.org/
  I learned of Cregit recently--just submitted it to HN after seeing multiple recent HN comments discussing issues related to line-based "blame" annotation granularity:
  "Cregit-Linux: how code gets into the kernel": https://news.ycombinator.com/item?id=43451654
  Of course, in your situation I guess such a tool would only help if other people use it. :D
- graemep 4 months ago
  
  I was thinking that I would really like a tool that shows the history of bad code, and who actually wrote it and amended it, not just who last changed it.
  Particularly so if I can see that someone wrote bad code, so I can review the rest of their code.
- Izkata 4 months ago
  
  git log/blame have -C and -M to follow modified lines across files. Unlike other version control, you don't have to have used a special command to rename files, because it doesn't track files that way in the first place. The maximum -C can even look for sources in other commits.
markerz 4 months ago

Ideally, you find the context for why a change was made
LarsenCC 4 months ago

True.
TZubiri 4 months ago

No. You are thinking of git bisect
- account42 4 months ago
  
  Bisect shows which commit is responsible for a yes/no behavior change. Blame shows which commit is responsible for a line of code. Both are useful for finding the responsible commit but for different things.
- lionkor 4 months ago
  
  No, bisect is not blame but for commits. Blame shows you which COMMIT is to blame. That's my point.
  
  TZubiri 4 months ago
  
  True
  Git log gives the author (et al) given a commit.
  Git blame gives the commit given the line and file.
  Worth noting that annotate and praise were added to address the semantics, regardless of whether they were the original intent or not
orthoxerox 4 months ago

Isn't that what git bisect is for?
- davely 4 months ago
  
  Sometimes you don't know which commit actually caused the problem.
  e.g., you realize that something broke A/B test logic on Friday. Sure, there are Jira tickets, but that's slow and annoying to dig through. There are commit messages, but things get squashed, etc. Plus, if you work in a monorepo with about 60 PRs a day, it's hard to know if it was your code or an associated library someone touched.
  That's exactly when git bisect helps. It quickly narrows down which commit introduced the bug when you don't know where to start looking. Once bisect identifies the problematic commit, you can then use git blame (if needed) to see who made those specific changes.
  Edit: Cleaned up what I was saying to hopefully avoid confusion.
  
  ants_everywhere 4 months ago
  
  I'm not quite sure what you're saying, but blame just tells you the last person and commit to change a line.
  If you want to know which commit actually caused a problem you would use bisect. That may be what you're saying, but it sounded a bit like you are saying blame is better for tracking the culprit commit.
- recursive 4 months ago
  
  Bisect is for finding behavior changes in O(log n) operations. Blame is for finding the last change to a line in one operation.

mmcclimon 4 months ago

> You can invoke git-who as git who by setting up an alias in your global Git config

This works even without the alias, by the way: by default `git whatever` will search your path for `git-whatever` and execute it.

weebst 4 months ago

Wow! I had no idea. Will need to update the README. Thanks for the tip!
- jeff_carr 4 months ago
  
  Yes, that is awesome. I wonder if "go" works like that also?
chatmasta 4 months ago

Has this behavior been the source of exploits in the past? Something about it feels dangerously presumptuous to me.
- mdaniel 4 months ago
  
  I am guessing it only resorts to that expansion if it dosesn't _already_ know about the command, because $(printf '#!/bin/sh\necho pwned\n' > /bin/git-status; chmod 755 /bin/git-status; git status) results in the thing happening that you'd expect, not a mysterious message
  FWIW, both brew and kubectl also have adopted this behavior (of $(basename)-plugin style verb extensions) so I find it unlikely they'd all do it if it was a straight-up facepalm
  
  igorbark 4 months ago
  
  probably adding a confirmation message the first time the alias is used for each command would be good, it would be nice to know when i'm invoking git and when i'm invoking a third party binary regardless of any exploit attempts!
  
  account42 4 months ago
  
  If malicious code ends up in your $PATH you have much bigger problems than git having a seamless plugin architecture.

seanhunter 4 months ago

For a low-tech version of this, I have long had an alias (which I call "nerdwars") to "git shortlog -ns --no-merges" which just gives the number of commits by contributor from most to least. It's a good way to get a sense for who the major contributors in a project are.

m000 4 months ago

Number of commits is not a very good metric to measure contributions. It would only work when there is an agreed style of commits and everyone sticks to that. Number of blame lines per contributor would be overall much more accurate, and immune to different styles of committing (e.g. squashed mega-commits vs rebased self-contained commits).

eMPee584 4 months ago

For anyone not aware of it: `tig` is a really cool TUI git frontend, and it has a beautiful `tig blame` sub command..

anotherpaulg 4 months ago

This is great. I do this sort of git-blame accounting to track how much code is written by AI versus humans in each release of my app.

My "blame script" has been slowing down as the repo size increases. I was just about to add caching, like you have.

Have you thought about adding the ability to limit the stats based on a set of file patterns? Perhaps like this, where the file follows gitignore conventions?

  git-who table -include-file <fname>
  git-who table -ignore-file <fname>

I tried to quickly add this functionality but unfortunately I don't know go.

weebst 4 months ago
That's a neat idea, I can see how it'd be useful.
If you have a shell that supports extended globbing, you could do something like:
```
  $ git who table */**/*.go
```
That works for me using Bash. I believe all that's happening here is that Bash is expanding the globs and passing a long list of individual filepaths as arguments to git who. Git who then passes them to git log so that it only tallies the commits you'd get by running:
```
  $ git log */**/*.go
```
- anotherpaulg 4 months ago
  
  Yup. It’s a complex enough set of in/excludes that I think that would get unwieldy for my use case.
  Details here:
  https://github.com/Aider-AI/aider/blob/main/scripts/blame.py
  Again, nice work on your tool. I’ll spend some more time trying to harness it for my need.
  
  weebst 4 months ago
  
  Thank you very much!
- JadeNB 4 months ago
  
  > $ git who table */**/*.go
  I might have my globbing syntax wrong, but I think that `*/**/*.go` is the same as `**/*.go` unless you have `*.go` files in the working directory.
reubenmorais 4 months ago

Git natively supports excludes in all pathspecs, e.g. `git log -- ':!generated/'` to exclude files in the `generated/` folder from showing up in the log.

nbenitezl 4 months ago

Gitlab/Github should add a feature that any submitted merge requests automatically emails the last author of the code lines being modified, to let them know about the MR and provide any feedback if needed.

Or maybe someone has wrote a bot/Git hook for that?

fouronnes3 4 months ago

For a linux user, you can already build such a system yourself quite trivially with git blame directly, piping it through grep awk and git log to email yourself that list with a cron job.

    (crontab -l 2>/dev/null; echo '15 22 \* \* \* /usr/bin/git blame --line-porcelain abc123.. -- /path/to/file.txt | awk "/^author-mail/ {print \$2}" | sort -u | /usr/bin/mail -s "Authors" user@example.com') | crontab -

chrishill89 4 months ago

The Git project has a script for that: https://github.com/git/git/blob/master/contrib/contacts/git-...
eddd-ddde 4 months ago

Basically just an OWNERS file, but asynchronously?

p0w3n3d 4 months ago

Things I'm missing in git are not how many lines or commits given developer did, which might lead in a poorly managed organisation to strangely calculated KPIs, but rather:

  - who deleted this line (which one?)
  - who is owner of this method (some guy refactored it or reformatted, but who is the REAL owner, or what was the history of this method)

ginko 4 months ago

> - who is owner of this method (some guy refactored it or reformatted, but who is the REAL owner, or what was the history of this method)
It doesn't work perfectly, but with magit you can jump to the revision before the refactor/reformat, then do another blame from there. I chased a line of code through several layers of refactors that way before and while the original author was long gone it did help explain why things were initially done that way.
I heavily depend on git-blame to understand code. It's one reason why I generally dislike "cleanup" changes that just change formatting/naming for the sake of it.
jval43 4 months ago

Yes, exactly.
That is what I use git for each and every day.

andrewfromx 4 months ago

I like it. A problem I had right away is some people commit using two different emails. Like one from home computer and one from work computer. Would be nice to be able to define them as the same thing.

sebastianlay 4 months ago

You might be able to do that with built-in git functionality called gitmailmap. It is basically a file where you can map multiple names and emails to the same one.
- Noumenon72 4 months ago
  
  I set this up and tested with `git log` and then found that PyCharm's git client apparently doesn't support this. Disappointing.
- Cadwhisker 4 months ago
  
  It's great when you read further down the comments and come across a gem like this. I had no idea this was possible; thanks.
weebst 4 months ago

Like other commenters have said, mailmap does this and git who will respect your mailmap file.
- max23_ 4 months ago
  
  Like the other comment, this is "I came looking for copper but found gold" moment for me. Thanks!
jtwaleson 4 months ago

That's what the mailmap is for.
dolmen 4 months ago
```
  git help mailmap
```

ttyyzz 4 months ago

"This requires that you have Go, Ruby, and the rake Ruby gem installed." - sticking to the binaries then :) Cool little project, will try it tomorrow!

account42 4 months ago

I don't think the comparison to git blame is needed/warranted. While the 'blame' in git blame suggests the tool is about identifying authors its main purpose is to identify commits so that you can find out the context of why something was changed. Instead this tool seems to be a fancy `git shortlog -sn`.

patrickdevivo 4 months ago

This is such a cool tool. It's a better approach to solving many of the questions I built MergeStat to answer (https://github.com/mergestat/mergestat-lite). It's been some time, but I also wrote a `git blame ...` parser in Go: https://github.com/mergestat/gitutils/blob/main/blame/blame.... :)

Amazing work and excited to dig into this more thoroughly

nextts 4 months ago

Don't tell the higher ups about this stack ranking tool.

kelseydh 4 months ago

I would love to see this get a brew release.

camdotcom14 4 months ago

it's ready!

ycombinatrix 4 months ago

Thank you! My consumer-scale git blaming was leaving me with woeful feeling of inadequacy.

dcchambers 4 months ago

This is fantastic! I love it. Pretty quick too...

For a rails codebase that is ~18 years old, has 1695 committers and more than 220,000 commits:

  time git who
  ...
  real    0m2.885s
  user    0m2.711s
  sys     0m0.767s

rednafi 4 months ago

Cool stuff. I like using Git via the CLI, but when it comes to blame, I simply use the preview UI of the VSCode GitLens extension. It takes half a second to launch it from the command palette and inspect the blame.

davely 4 months ago

I have some VS Code extension (errr... not sure which) that faintly inlines the git blame result on each line of code you're working on. I find it kind of handy.
- apple1417 4 months ago
  
  That was actually recently added as a built-in feature
  https://code.visualstudio.com/updates/v1_97#_git-blame-infor...
  Like the sibling comment, I didn't want to run all of GitLens just for it, but now that it's a built-in I've also been finding it quite useful.
- lelandfe 4 months ago
  
  GitLens https://marketplace.visualstudio.com/items?itemName=eamodio....
  I uninstalled it, I seem to recall it impacting the speed of VS Code a good bit.

physicsguy 4 months ago

I like this one the best: https://github.com/jayphelps/git-blame-someone-else

unquietwiki 4 months ago

This looks like an almost pure Golang program, but still has a Ruby dependency. Is there a component/library that Ruby provides, that Go doesn't have? Or some other logic going on?

ajanuary 4 months ago

I’m pretty sure it’s just using Ruby for rake, which is a task runner. So Ruby is only needed for the build process.

avalys 4 months ago

I look forward to the inevitable upgraded version, “git whom”.

camdotcom14 4 months ago

wow that is actually hilarious! XD

numbers 4 months ago

I love TUI tools but I'm not too familiar with Golang, now I am thinking I should start looking into using go for TUIs

this is a great tool!

qudat 4 months ago

At pico.sh we have been experimenting with TUIs and remote clis successfully for a few years, you can see how we build our ssh tui app here: https://github.com/picosh/pico/tree/main/pkg/tui

max23_ 4 months ago

This is neat.

I made a similar powershell script recently but reverse search from filename to find the authors by commits.

syhol 4 months ago

I've been using `git summary` from tj/git-extras for a while. It seems to do a similar job.

fmeyer 4 months ago

Nice one, works better than mine;

I've been using a git alias for quite some time

`lead = shortlog -s -n --all --no-merges`

jtwaleson 4 months ago

Very cool, I'm working on something similar as part of a bigger project (not TUI related). I'm interested in how you did blame caching, will take a look at the implementation. I am trying to do a "forward blame" so that the blame of new commits can be created very quickly. Happy to exchange some thoughts around this!

natemwilson 4 months ago

Cool, but how do I increase the number of rows? Is it always just the top ten?

weebst 4 months ago

The -n flag does this. Use -n 0 to show all rows

eddiejaoude 4 months ago

I love CLI tools, plus it is open source!

coryvirok 4 months ago

https://shortlog.io/

_andrei_ 4 months ago

go install github.com/sinclairtarget/git-who@latest

LarsenCC 4 months ago

Cool stuff!

kazinator 4 months ago

Tiny prototype implementation:

Run on log from GNU Bison. We anonymize names so that search engines don't index this comment to those names:

  $ ./gwho git-log-stat.txt
  NAME                     LAST-SEEN                      FILES    LINES+   LINES- COMMITS
  A___ D_______            Tue Sep 20 08:19:02 2022 +0200 17083    356066   255931    4440
  P___ E_____              Mon Mar 17 17:46:43 2025 -0700  4496     61898    71486    1123
  J___ E_ D____            Sun Aug 21 17:35:26 2011 -0400  3922     75517    50121     612
  R_____ A_____            Thu May 2 16:43:00 2002 +0000    101      7631     4522      23
  J____ T____              Sun Jan 21 16:43:58 2001 +0000   200      8308     3205      60
  P___ H________           Tue Feb 26 16:28:36 2013 -0800   122      5057     2864      26
  A___ R_______            Wed Jan 5 15:47:25 2011 +0200    124      5297     2101      30
  T________ R______        Tue Nov 13 10:38:49 2012 +0000   229      3841     2744      94
  V_______ T_____          Wed Nov 11 18:55:15 2020 +0100    67      4739     1128      17
  J__ M_______             Sat Jan 18 20:52:21 2020 -0800   337      1569     3894      45
  J___ M_____ G_______     Mon May 12 00:58:38 2008 +0000    91      2570     2060      49
  R______ M_ S_______      Mon Jan 5 00:25:39 1998 +0000    134      2978     1155      64
  P____ B______            Tue Nov 11 13:37:36 2008 +0100    90      2991      786      17
  A____ V___               Mon Sep 19 19:09:20 2022 +0200   130      2335      943      46
  M___ A_____              Sun Jan 20 15:59:34 2002 +0000   201      1797     1291      76
  D____ J__                Sun Dec 7 21:54:45 2008 -0800     57      2142      816      15
  P_____ B___              Fri Oct 19 11:03:50 2001 +0000    67      1771     1182      19
  E___ B____               Thu Aug 27 10:56:53 2009 -0600    61      2172      590      22
  V_____ S_____            Fri Jun 29 16:23:42 2012 +0200    64       865      918       9
  D____ M________          Thu Nov 10 22:34:22 1994 +0000    26      1526       84      18
  D_____ H_________        Thu Jun 13 10:08:19 2013 +0200     6      1142       54       1
  V______ I______          Sat Jan 23 13:25:18 2021 -0500    47       824      360      16
  A_____ V___________      Thu Feb 27 09:52:03 2020 +0100    25       524      189      13
  V_____ M______ C______   Fri Feb 14 18:41:55 2020 +0100    16       284      364       1
  J_____ W___              Tue Feb 16 08:00:28 2021 -0600    18       306       33       3
  W_______ P____           Thu Feb 21 17:08:18 2008 +0000    13       195       76       9
  J___ S____               Wed Jul 26 00:30:05 2017 -0400    67        88       88      63
  E___ S_ R______          Wed Feb 13 10:39:54 2019 -0500     2       133       33       1
  Y_______ K_____          Mon Nov 11 08:57:15 2019 +0900     5       125       13       2
  K______ K_______         Sun Jan 27 06:58:17 2019 +0100     3        85       24       1
  H_ S_ T___               Fri Mar 1 06:16:54 2019 +0100      1        45       51       1
  T__ L__________          Tue Mar 27 19:28:02 2012 +0000     9        81        9       5
  T__ V__ H_____           Fri Jan 11 15:32:06 2002 +0000     7        55       34       3
  A________ D_________     Wed May 14 18:41:48 2003 +0000     3        66       16       1
  L_____ V_____            Fri Aug 9 14:24:14 2019 +0200      8        71        2       2
  M______ D_ B_________    Thu Jul 30 20:53:35 2020 +0200    10        30       29       3
  J______                  Tue Nov 20 22:02:20 2018 +0100     3        30       28       3
  J_______ N_____          Tue Dec 15 22:03:18 2009 -0600    10        42       13       3
  N___ F_______            Mon Sep 6 19:51:09 1993 +0000      7        36       18       7
  K__ K______              Tue Oct 13 15:39:41 2020 -0700     5        26       28       2
  E_____ S________         Mon Dec 10 15:18:37 2018 +0200    10        25       25       1
  M_____ R____             Wed Nov 18 09:10:01 2020 +0100    10        26       13       2
  J___ B_____              Mon Oct 2 20:04:58 2000 +0000      3        26        8       1
  T_____ P________         Tue May 19 22:05:22 2020 +0200     4        29        2       1
  A_____ B_______          Sun Mar 6 22:19:18 2011 -0500      6        24        2       2
  N___ G_____              Tue Oct 27 06:12:27 2020 +0000     4        14       11       3
  B____ K_____             Sat Feb 19 19:24:07 2011 -0500     6        22        2       2
  A___ S______             Wed Oct 31 14:01:31 2018 +0000     1        12        9       1
  S_____ T______           Mon Nov 24 15:27:49 2008 +0100     1        11        6       1
  F______ K____            Tue May 14 00:25:23 2002 +0000     3        11        6       2
  A______ H______          Fri Apr 29 04:08:35 2022 -0400     2         8        6       1
  B____ H_____             Sat Dec 18 18:45:46 2021 +0100     4         3        9       2
  T___ C_ M_____           Tue Nov 10 07:36:11 2020 +0100     2         8        3       1
  E______ S_________       Fri Nov 4 11:50:32 2022 -0700      3         5        5       1
  k_____ y                 Mon Nov 11 23:27:37 2019 +0900     3         3        3       3
  S______ L________        Sat Jul 21 17:24:23 2012 +0200     3         3        3       2
  A____ D_______           Sat Feb 15 10:49:14 2020 +0100     2         2        2       2
  J_____ L_                Fri Aug 24 17:35:32 2018 +0000     1         1        1       1
  D_____ H______           Wed Nov 29 01:26:22 1995 +0000     1         1        1       1
  A______ S_____           Sat Sep 28 00:00:34 2013 +0200     1         1        1       1
  A_____ R___              Mon Jun 14 21:54:40 2021 +0000     1         1        1       1

Code:

  #!/usr/bin/env txr
  @(do
     (defstruct author ()
       name
       e-mail
       last-seen
       (files 0)
       (lines+ 0)
       (lines- 0)
       (commits 0))

     (defvarl ah (hash)))
  @(repeat)
  Author: @name <@addr>
  Date: @date
  @(skip)
   @files file@nil changed, @ins insertion@nil, @del deletion@nil
  @  (set name @(flow name ;; anonymize name
                  (spl " ")
                  (map (op map (do if (plusp @2) #\_ @1) @1 0))
                  (join-with " ")))
  @  (do (let ((a (or [ah name]
                      (new author
                           name name
                           e-mail addr
                           last-seen date))))
           (inc a.commits)
           (inc a.files (tointz files))
           (inc a.lines+ (tointz ins))
           (inc a.lines- (tointz del))
           (set [ah name] a)))
  @(end)
  @(do
     (flow (hash-values ah)
       (csort @1 > [callf + .lines+ .lines-])
       (cons (new author
                  name "NAME" last-seen "LAST-SEEN" files "FILES"
                  lines+ "LINES+" lines- "LINES-" commits "COMMITS"))
       (each ((a @1))
         (put-line `@{a.name 24} @{a.last-seen 30} @{a.files -5} \ \
                    @{a.lines+ -8} @{a.lines- -8} @{a.commits -7}`))))

einpoklum 4 months ago

Followed the link, and the README said:

> This requires that you have Go, Ruby, and the rake Ruby gem installed.

That doesn't cut it for me. git - once built - depends on C libraries and Perl. If you want to add something onto git (that is not specifically targeting Go, or Ruby etc.) - it should not IMNSHO depend on other things.

That doesn't mean you can't write your tool in some modern fashionable language, but eventually you need to bring it down to earth (or rather earth + Perl).

weebst 4 months ago

These are all build dependencies. You don't need any of these just to run git who. The language could be clearer; I'll update it.
- einpoklum 4 months ago
  
  Ah, ok, great! I'll try it then...