Code Monkey home page Code Monkey logo

Comments (18)

pwaller avatar pwaller commented on August 11, 2024 1

I like the idea in principle, but I think this repository has enough followers that it may cause harm to drop the old content.

In practice, the repository is quite small even at 11MiB.

If you decided to do it, you could keep a copy of the old repository around at llir/llvm-legacy, analogously to https://github.com/go-gl-legacy/gl - and that way if anyone does need the old content (e.g, they were depending on a specific git hash) then at least it still exists for the purposes of figuring out what the git hash is in the new repository.

Maybe it's possible to keep the old history around in a separate git ref which doesn't get cloned by default. But in that case I guess the content would be harder to discover.

from llvm.

sbinet avatar sbinet commented on August 11, 2024 1

@pwaller why matplotlib? gonum/plot is so much better :P

from llvm.

mewmew avatar mewmew commented on August 11, 2024

Thanks for creating the issue, I agree, it's easier to keep the discussion here.

First, can I clarify the question - are you asking how to remove lots of old large assets from the history of the repository?

Exactly!

On the other hand, I did a git clone just now to check the size of the repo, and it wasn't as bad as I had thought. Perhaps we don't need to do this after all.

[u@x1 ~]$ time git clone https://github.com/llir/llvm
Cloning into 'llvm'...
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (20/20), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 10760 (delta 10), reused 15 (delta 6), pack-reused 10740
Receiving objects: 100% (10760/10760), 9.03 MiB | 1.47 MiB/s, done.
Resolving deltas: 100% (6437/6437), done.

real	0m7.715s
user	0m1.810s
sys	0m0.521s

[u@x1 ~]$ du -hs llvm
11M	llvm

I think we could prune the repo down to 1 MB or so instead of 11 MB, but the question is if it's worth it, given that it requires a force push.

However, should we decide to do this, then having it ready before the v0.3.0 release seems like a perfect time.

from llvm.

mewmew avatar mewmew commented on August 11, 2024

I like the idea in principle, but I think this repository has enough followers that it may cause harm to drop the old content.

In practice, the repository is quite small even at 11MiB.

Agreed. Had the repo been at 100 MB, then we probably would have done it, but at this size it does not seem worth the potential harm to users. (The idea with shrinking the repo was of course to make it easier for users to make the initial download, especially those who happen to be on a slow Internet connection, as may be common in parts of Asia, etc).

So, for now. I'm fine with keeping it as it is, and just being careful when adding large content in the future. Closing this issue for now. We can always refer back and re-open at a later point.

from llvm.

mewmew avatar mewmew commented on August 11, 2024

Maybe it's possible to keep the old history around in a separate git ref which doesn't get cloned by default. But in that case I guess the content would be harder to discover.

Also, if Go ever does shallow Git clone, this issue would be resolved I think. (upstream issue golang/go#13078)

from llvm.

pwaller avatar pwaller commented on August 11, 2024

I just learned that Go did this to their repository recently, the discussion in there and how they went about it is pretty interesting:

golang/go#28899

I think it probably doesn't change anything with respect to what we might do to this repository.

from llvm.

mewmew avatar mewmew commented on August 11, 2024

I just learned that Go did this to their repository recently, the discussion in there and how they went about it is pretty interesting:

Thanks for the link! It was an interesting read to see how they resolved it.

I think it probably doesn't change anything with respect to what we might do to this repository.

Most likely not. If we end up doing a pruning, then I'd suggest we use bfg as suggested on the GitHub link you posted. Also, if we do this, then perhaps in the next few weeks, as the intention is to have v0.3.0 released some time in early December.

I'm kind of still a bit on the fence. I don't think we need the rewrite. However, should we ever do one, now is basically the perfect time to. As we move from v0.2 to v0.3, since users will have to do manual changes to get the latest release anyways (updating to the latest API, etc).

from llvm.

mewmew avatar mewmew commented on August 11, 2024

Until we decide for sure. I'll re-open the issue. Also, this may help get input from other users of the repo who it may affect. I'll also re-name the title to include a mention of Git history rewrite.

from llvm.

pwaller avatar pwaller commented on August 11, 2024

Some large paths:

 git rev-list --objects --all | git cat-file --batch-check='%(objectsize:disk) %(objectname) %(objecttype) %(rest)' | grep ' blob ' | awk '{print $4" "$1}' | awk '{
    arr[$1]+=$2
   }
   END {
     for (key in arr) printf("%s\t%s\n", arr[key], key)
   }' | sort -nr | awk '{print $2"\t"$1}' | column -t -s$'\t' | head
old/asm/internal/testdata/sqlite/sqlite3.ll                                 3404085
old/asm/internal/testdata/sqlite/sqlite3.c                                  1726782
asm/internal/parser/actiontable.go                                          472010
old/asm/internal/parser/actiontable.go                                      246534
asm/internal/parser/gototable.go                                            149186
old/asm/internal/parser/gototable.go                                        67735
asm/testdata/DebugInfo/COFF/big-type.ll                                     61648
asm/internal/ll.bnf                                                         59883
asm/ll/ll.tm                                                                59610
asm/testdata/c4.ll                                                          55555

This graph shows how much space will be saved, assuming you eliminate large file paths:

image

from llvm.

pwaller avatar pwaller commented on August 11, 2024

kill_ids.csv

from llvm.

mewmew avatar mewmew commented on August 11, 2024

The current intention is to clone llir/llvm into llir/llvm-legacy, to preserve the complete history. Then, to start clean, we will keep any fine currently in HEAD, and it's entire history at that path. Since we need to do a force push anyway, this seem to be the time to really get the size of the repo down.

If anyone currently using the repo has some input or feedback, feel welcome to contribute your thoughts.

from llvm.

pwaller avatar pwaller commented on August 11, 2024

@mewmew and I propose to run the following:

$ du --apparent-size -sch .git
9.5M	.git
9.5M	total

# Kill objects at and before v0.2.1
git rev-list --objects v0.2.1 | awk '{print $1}' > killset.txt

# Kill unwanted objects - testdata, textmapper and other experimental code.
git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(rest)' | egrep '(/testdata/| l/|\.tm$)' | awk '{print $1}' >> killset.txt

java -jar ~/Downloads/bfg-1.13.0.jar -bi killset.txt 

git repack -a && git reflog expire --expire=now --all && git gc --prune=now --aggressive

$ du --apparent-size -sch .git
800K	.git
800K	total

from llvm.

pwaller avatar pwaller commented on August 11, 2024

See https://github.com/llir/llvm-clean for the new repository. The intent is to force push the HEAD of that repository into llir/llvm at some point (or to redo the above commands against this repository assuming development continues here for now).

from llvm.

mewmew avatar mewmew commented on August 11, 2024

Here is a list of known users of llir/llvm:

We can try to be good open source citizens and send PRs to update their usage to v0.3.x :) Once this is done, we update llir/llvm with the cleaned version of the repo (currently living at https://github.com/llir/llvm-clean).

Edit: the llir/llvm-clean repository has now been removed, as it's been integrated back into llir/llvm.

from llvm.

pwaller avatar pwaller commented on August 11, 2024

https://github.com/reedkotler/scala-llc doesn't seem to contain any go code?

from llvm.

mewmew avatar mewmew commented on August 11, 2024

https://github.com/reedkotler/scala-llc doesn't seem to contain any go code?

Oh, the code match was from the BNF https://github.com/reedkotler/scala-llc/blob/ff3578b14171a5332e1c7f972c0c40b32f7a9e4c/ll.bnf#L187

<< import (
   "github.com/llir/llvm/asm/internal/ast"
   "github.com/llir/llvm/asm/internal/astx"
) >>

We can remove it from the list.

from llvm.

mewmew avatar mewmew commented on August 11, 2024

I'd like to trim the llir/llvm repo size today, using the approach outlined by @pwaller in #38 (comment), essentially the earlier we do this the better. So we can keep Git history intact going forward.

from llvm.

mewmew avatar mewmew commented on August 11, 2024

On the 30th of November we pruned the using BFG to reduce its initial download size. The following commands were run at the old revision d3f412d.

$ du --apparent-size -sch .git
9.6M	.git
9.6M	total

# Kill objects at and before v0.2.1
git rev-list --objects 7a17b32c1767cfeb5287d164e92865adb98985c8 | awk '{print $1}' > killset.txt

# Kill unwanted objects - testdata, textmapper and other experimental code.
git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(rest)' | egrep '(/testdata/| l/|\.tm$)' | awk '{print $1}' >> killset.txt

bfg -bi killset.txt 

git repack -a && git reflog expire --expire=now --all && git gc --prune=now --aggressive

$ du --apparent-size -sch .git
934K	.git
934K	total

from llvm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.