Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxidise ‘git summary’ and ‘git summary —line’ #1742

Open
sanga opened this issue Dec 26, 2024 · 4 comments
Open

Oxidise ‘git summary’ and ‘git summary —line’ #1742

sanga opened this issue Dec 26, 2024 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sanga
Copy link

sanga commented Dec 26, 2024

Summary 💡

I.e https://github.com/tj/git-extras/blob/main/bin/git-summary

That’s a shell script and I imagine moving it to rust would be a significant performance bump. Particularly with ‘—line’ that can be kinda slow on large repos. Also as it appears to be using fairly standard git tools I’m supposing that everything might already in place (I noticed that GO just gained some support for blame), though TBH I haven’t done the due diligence so I’m not 100% sure.

If everything is in place I guess I could even take a crack at this myself if you agree to taking that feature in?

Motivation 🔦

Git summary is a nice tool to get an idea of the main contributors to an unknown repo. But it’s kinda slow for large repos

@sanga sanga added the enhancement New feature or request label Dec 26, 2024
@sanga
Copy link
Author

sanga commented Dec 26, 2024

To be clear, I mean including this in 'gix' or 'ein' directly

@Byron
Copy link
Member

Byron commented Dec 26, 2024

Thanks for making me aware!

Indeed I think all the functionality that's needed for a fast implementation of git summary is there. Personally, I'd use onefetch for this:

❯ onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.39.5 (Apple Git-154)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: git (27 branches, 1254 tags)
       ++++++++++++++++++++++++++          HEAD: d882f382b3 (master, origin/master)
    ++++++++++++++++++++++++++++++++       Pending: 1+- 18+
 +++++++++++++************+++++++++++++    Version: v2.48.0-rc0
+++++++++++******************++++++++;;;   Created: 19 years ago
+++++++++**********************++;;;;;;;   Languages:
++++++++*********++++++******;;;;;;;;;;;              ● C (47.0 %) ● Shell (46.9 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● Perl (2.0 %) ● TCL (1.8 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Makefile (0.7 %) ● Python (0.6 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;              ● Other (1.1 %)
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;   Authors: 35% Junio C Hamano 26567
+++++++********::::::::::**;;;;;;;;;;;;;             6% Jeff King 4485
++++++++*********::::::******;;;;;;;;;;;             3% Johannes Schindelin 2296
++++++:::**********************::;;;;;;;   Last change: a week ago
+++::::::::******************::::::::;;;   Contributors: 2290
 :::::::::::::************:::::::::::::    URL: https://github.com/git/git
    ::::::::::::::::::::::::::::::::       Commits: 75634
       ::::::::::::::::::::::::::          Churn (714): Makefile 27
          ::::::::::::::::::::                          reftable/stack.c 22
              ::::::::::::                              …/buildsystems/CMakeLists.txt 21
                 ::::::                    Lines of code: 603415
                                           Size: 42.49 MiB (4574 files)
                                           License: GPL-2.0-only

However, in order to get --line information, I think one could use something like ein t hours -pl. Thus, I believe that all of the functionality is already present in ein t hours, and one could probably refactor it so that it can power something like ein t summary as well.

Research

For completeness, here is the script's output when executed on the Git repository itself.

❯ time bash summary.sh
git: 'root' is not a git command. See 'git --help'.

The most similar commands are
        hook
        remote

 project     : git
 repo age    : 20 years
 branch:     : master
 last active : 8 days ago
 active on   : 6738 days
 commits     : 75634
 files       : 4574
 uncommitted :        5
 authors     :
 26567  Junio C Hamano                   35.1%
  4485  Jeff King                        5.9%
  2296  Johannes Schindelin              3.0%
  1945  Ævar Arnfjörð Bjarmason          2.6%
  1824  Nguyễn Thái Ngọc Duy             2.4%
  1401  Shawn O. Pearce                  1.9%
  1279  René Scharfe                     1.7%
  1203  Patrick Steinhardt               1.6%
  1163  Elijah Newren                    1.5%
  1118  Linus Torvalds                   1.5%
   954  Michael Haggerty                 1.3%
   866  brian m. carlson                 1.1%
   855  Jonathan Nieder                  1.1%
   820  Derrick Stolee                   1.1%
   754  Jiang Xin                        1.0%
   745  Taylor Blau                      1.0%
   641  Christian Couder                 0.8%
   638  Stefan Beller                    0.8%
   600  Eric Wong                        0.8%
   577  SZEDER Gábor                     0.8%
[..]
bash summary.sh  1.45s user 0.23s system 110% cpu 1.509 total

In comparison, here is the output of ein t hours, a tool that probably already processes all the data that the script sees:

❯ time ein t hours
 12:40:09 traverse commit graph done 75.6K commits in 0.39s (193.0K commits/s)
 12:40:09        estimate-hours Extracted and organized data from 75634 commits in 302.083µs (250374896 commits/s)
total hours: 61139.10
total 8h days: 7642.39
total commits = 75634
total authors: 2264
total unique authors: 2153 (4.90% duplication)
ein t hours  0.46s user 0.11s system 135% cpu 0.421 total

With the -p flag, it shows more information:

[..]
Jeff King <[email protected]>
4485 commits found
total time spent: 3513.56h (439.20 8h days, 5.75%)

Junio C Hamano <[email protected]>
26567 commits found
total time spent: 14482.98h (1810.37 8h days, 23.69%)
[..]

And with the -l flag, it also obtains line information:

[..]
Junio C Hamano <[email protected]>
26567 commits found
total time spent: 14482.98h (1810.37 8h days, 23.69%)
total lines added/removed: 271319/133779 (6.49%)

total hours: 61139.10
total 8h days: 7642.39
total commits = 75634
total authors: 2264
total lines added/removed/remaining: 3859275/2379017/1480258
total unique authors: 2153 (4.90% duplication)
stats omitted for 19524 merge commits
ein t hours -pl  67.43s user 1.82s system 1490% cpu 4.646 total

When using --line on git summary, it's like it invokes a Git process per commit, which is very slow.

> time ./summary.sh --line
git: 'root' is not a git command. See 'git --help'.

The most similar commands are
        hook
        remote

 project     : git
^C
./summary.sh --line  0.01s user 0.01s system 0% cpu 1:39.39 total

I didn't take the time to let that finish, as it's clearly not made for more than a few hundred commits.

@Byron Byron added the help wanted Extra attention is needed label Dec 26, 2024
@sanga
Copy link
Author

sanga commented Dec 27, 2024

Ok. That seems like it's 95% of the way there for me. I guess the only missing feature from git summary --line would be that ein doesn't appear to count remaining lines per author (I didn't spot that at least). That's kinda nice to know as if remaining lines is near to or actually zero you know that the author is inactive.

Other than that it's mostly just small UX things for me. For example, I think it would be useful to have the list of authors in order of how much of the project they authored (ie sorted by number of commits or number of lines touched).

Would you accept a PR to:

  • add remaining lines per author and
  • sort the list by "amount of contribution" (ie lines or commits)

?

@Byron
Copy link
Member

Byron commented Dec 27, 2024

Here is what summary --lines looks like on the gitoxide repository, in 1m53s.

 project     : gitoxide
 lines       :   351996
 authors     :
331667 Sebastian Thiel              94.2%
5472 Eliah Kagan                    1.6%
1675 Conor Davis                    0.5%
1202 Sidney Douw                    0.3%
1055 Ed Page                        0.3%
 844 Yuri Astrakhan                 0.2%
 808 Christoph Rüßler               0.2%
 804 Pascal Kuthe                   0.2%
 803 Edward Shen                    0.2%
 767 Jiahao XU                      0.2%
 748 Nathaniel Brough               0.2%
[..]

The data needed for this is already collected by ein t hours, even though the presentation really wants to be in ein t summary I think. hours is a rewrite of the respective node tool, which likewise was too slow for use in typical repositories.

With the right refactoring, I think ein t hours and ein t summary can share the code that does the heavy lifting, and ein t hours can stay what it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants