Comments (8)
Today, the only way to efficiently handle cloning a large repository using west update that I am aware of is to limit the fetch depth as in -o=--depth=.
There are other (and not mutually exclusive) optimizations
- #638
- actions/checkout#1152
-
performance
How long things take
must be deep enough to include the specific SHA listed in the west.yml file, or else the update step fails.
I don't think this is how west
typically works. Take a look at
An alternative to limiting the fetch depth is to share objects with a local reference repository
Did you look at west update -h
?
from west.
Today, the only way to efficiently handle cloning a large repository using west update that I am aware of is to limit the fetch depth as in -o=--depth=.
There are other (and not mutually exclusive) optimizations
- Consider adding config option for treeless clones (--fetch-opt=--filter=...) #638
- Support for Treeless clones actions/checkout#1152
performance
How long things takemust be deep enough to include the specific SHA listed in the west.yml file, or else the update step fails.
I don't think this is how
west
typically works. Take a look atAn alternative to limiting the fetch depth is to share objects with a local reference repository
Did you look at
west update -h
?
The caching mechanism added by @mbolivar-nordic in c50d342 does not actually take advantage of Git's object sharing. It still clones all of the objects and the entire history of the repository, only does so locally rather than over the network, which of course is an improvement, but not what's being asked here. The crucial step is to set up a .git/objects/info/alternates
file. Using the --shared
option when running git-clone
does that. Other tools that use Git would have to create this file themselves, which is fairly easy to do. That would result in a much smaller size for the .git directory in the project workspace as well, thus saving a lot of disk space in addition to a faster creation of the work tree.
On a related note, using git-init
and git-fetch
is much preferred over using git-clone
. In other words,
git init
git remote add <remote_name> <remote_url>
[set up .git/objects/info/alternates to point to objects in <local_cache>]
git fetch <remote_name>
is much better than
git clone <local_cache>
git set-url <remote_name> <remote_url>
git fetch ...
which seems to be how west works when given a cached repository.
from west.
That would result in a much smaller size for the .git directory in the project workspace as well,
Much smaller disk space... if you don't count the initial repos.
only does so locally rather than over the network, which of course is an improvement, but not what's being asked here
There is no doubt --shared
would be a big optimization. But as with any optimization work the most important question is: "How much?". More precisely: how much compared to existing optimizations? Greatly increasing the complexity of the code base for saving a few percents would never be worth it.
So far you haven't provided any number, not even any order of magnitude. You don't sound like you've explored all available options either: your first sentence at the top is "--depth is the only efficient way I'm aware of", which is incorrect.
Interactive users clone very rarely from scratch. In our CI, west update
takes 1-2 minutes from scratch (using the existing optimizations I listed) which is acceptable for us. Need some time to run tests anyway.
So what is your use case? Development normally happens to fix tangible and measurable issues, not just "cool ideas".
Before implementing one of the existing optimizations, @mbolivar-ampere spent a lot of time performing some measurements. You can find those at one of the links I shared above if you're interested.
On a related note, using git-init and git-fetch is much preferred over using git-clone.
west
used to do this but it was changed in e283d99
from west.
That would result in a much smaller size for the .git directory in the project workspace as well,
Much smaller disk space... if you don't count the initial repos.
Think of many concurrent workspaces, not just a single one.
only does so locally rather than over the network, which of course is an improvement, but not what's being asked here
There is no doubt
--shared
would be a big optimization. But as with any optimization work the most important question is: "How much?".
Multiple workspaces sharing the same Git objects is very clearly a huge advantage, both in terms of storage and speed of checkout. Imagine N users, or many concurrent CI jobs, using the same Git mirrors on some NFS share locally. N workspaces sharing the same repository histories is very clearly an advantage over N workspaces and N replications of the same history. And it's faster.
More precisely: how much compared to existing optimizations?
I'm not going to do that comparison. But feel free to do some Google searching on the advantages of sharing Git objects with a reference repository.
Greatly increasing the complexity of the code base for saving a few percents would never be worth it.
What exactly is the complexity? And no, it's not a few percents.
So far you haven't provided any number, not even any order of magnitude.
It should be self-evident. A single replicated history, which is a constant, versus N replicated histories.
You don't sound like you've explored all available options either: your first sentence at the top is "--depth is the only efficient way I'm aware of", which is incorrect.
I indeed have. As I said, the caching implementation in west is mediocre at best and doesn't address the issue of object sharing.
Interactive users clone very rarely from scratch.
Now I am going to challenge statements like this -- please provide some numbers. How many users? How often?
In our CI,
west update
takes 1-2 minutes from scratch (using the existing optimizations I listed) which is acceptable for us. Need some time to run tests anyway.
The assumption in that statement is that the Git repositories involved are small. But what if large repositories are involved and they may not be using LFS?
So what is your use case?
I mentioned that earlier -- many concurrent workspaces using large repositories.
Development normally happens to fix tangible and measurable issues, not just "cool ideas".
I appreciate that.
Before implementing one of the existing optimizations, @mbolivar-ampere spent a lot of time performing some measurements. You can find those at one of the links I shared above if you're interested.
I'll take a look.
On a related note, using git-init and git-fetch is much preferred over using git-clone.
west
used to do this but it was changed in e283d99
from west.
Interactive users clone very rarely from scratch.
Now I am going to challenge statements like this -- please provide some numbers. How many users? How often?
You're the one asking for a "clearly", "self-evident" new feature - without providing any number, reproducible use case, example, measurements of existing optimizations, prototype code or any offer to contribute or help[1]. You seem to have a performance problem to solve[2]. I don't.
Now answering your question anyway:
- Doctor, it hurts when I keep cloning from scratch interactively.
- Don't.
You don't sound like you've explored all available options
I indeed have.
Then share some reproducible example and actual data, not "self-evidence".
[1] "I'll leave it to the feature designer/developer..." - who is that? "I'm not going to do that comparison. Feel free to Google..."
[2] assuming it's not a https://en.wikipedia.org/wiki/XY_problem
from west.
What exactly is the complexity?
This was just an example. Every feature and code addition increases complexity - and bugs, and maintenance costs. If you take a quick look at the git log, you'll notice this project is not really staffed with an army of full-time developers. Not even one full time in fact, very far from it.
I have no idea what would the complexity be in this particular case but your description of the new feature is not exactly short while still leaving a lot of opens. If you think this would be a small effort then I can't wait for your pull requests (with some sample data to back them up). Don't forget the test code.
from west.
It is desirable for a tool built on top of Git to allow using the facilities that it offers for dealing with various complexities, particularly cloning large repositories. west is deficient on that front because it does not support using Git's object sharing mechanism, which is a well-known and primary feature of this tool. While I'd like to motivate the need for my feature ask with numbers, suffice it to say that some development and test environments rely on object sharing. I ask that the reader refer to the wealth of literature available on this topic to learn more.
About the problem statement being long, sure, it could have been more concise. But thoroughness was the goal.
I realize and appreciate how with limited time and resources, feature requests have to be addressed judiciously.
I can take a stab at extending west and adding the desired behavior. I'll make a pull request if I decide that what I have is presentable. And I certainly hope that then the conversation goes a little better.
from west.
Turns out, this feature request is closely related to (really a duplicate of) #625.
from west.
Related Issues (20)
- west init: CLI argument to automatically setup new workspace from template HOT 11
- `already defined as extension command`-error prints wrong spec HOT 1
- Introduce absolute path variant of zephyr.base HOT 13
- west cannot process a git branch name containing a single quote ' HOT 8
- west init access denied on windows HOT 2
- git: Add support for sparse checkout HOT 5
- Investigate if the performance of submodule update could be improved HOT 7
- west update loops infinitely over the first repository HOT 14
- West Re-Implementation - git ws
- west update - AttributeError: 'NoneType' object has no attribute 'err' HOT 1
- menuconfig aborting due to Kconfig warnings HOT 4
- Allow import of optional projects from manifest file HOT 7
- Moving from Zephyr 3.2.99 to 3.5.99: `west` cannot find its configuration file HOT 3
- --mr documents argument to be a revision HOT 1
- `pip install` fails on MSYS2 HOT 3
- Infinite loop when building in a moved directory HOT 4
- `die_if_no_git` not listed under west API HOT 2
- Log message in west command __init__ func may cause infinite call stack HOT 3
- fatal: fetch-pack: invalid index-pack output HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from west.