A simple GIT URL parser similar to giturlparse.py.
https://git-url-parse.readthedocs.io/
MIT
A simple GIT URL parser.
License: MIT License
A simple GIT URL parser similar to giturlparse.py.
https://git-url-parse.readthedocs.io/
MIT
when parsing HTTPS urls the server url of the originating repository is not parsed. We would need to parse urls for both github.com and our internal github enterprise server instance and capture the originating server.
This repo should contain a CI which would be helpful to see if the tests are passing.
ref: #4 (comment)
test-requirements.txt lists pytest-helpers-namespace
, however it appears to be unused.
six
and click
are never used in the library, not sure why it's in the requirements.
Also, the behavior of the tests is same with or without click
and six
.
https://github.com/nephila/giturlparse/ is giturlparse
on PyPI. It has become more active recently.
As the two modules share the installed module name giturlparse
we should either aim to having the same API, or https://github.com/coala/git-url-parse should install itself into git_url_parse
with a shim at giturlparse
for backwards compatibility.
SOOS has a free community edition for open source projects and publishes CycleonDX, SPDX, and VEX SBOMs- which will include any vulnerability attestations you make. It would be great to have you as a user! Your current public page is here: https://app.soos.io/research/packages/Python/-/git-url-parse/
Details here:
https://soos.io/products/community-edition
Parsing any URL which doesn't contain a /
in the path fails.
For example, this is correct (even if the 'owner' part isn't generally meaningful):
>>> giturlparse.parse('ssh://[email protected]/path/repo.git')
Parsed(pathname='/path/repo.git', protocols=['ssh'], protocol='ssh', href='ssh://[email protected]/path/repo.git', resource='example.com', user='user', port=None, name='repo', owner='path')
But this yields nonsensical results:
>>> giturlparse.parse('ssh://[email protected]/repo.git')
Parsed(pathname='/[email protected]/repo.git', protocols=['ssh'], protocol='ssh', href='ssh://[email protected]/repo.git', resource='ssh', user=None, port=None, name='repo', owner='[email protected]')
This, again, is correct:
>>> giturlparse.parse('[email protected]:path/repo.git')
Parsed(pathname='path/repo.git', protocols=[], protocol='ssh', href='[email protected]:path/repo.git', resource='example.com', user='user', port=None, name='repo', owner='path')
And this fails:
>>> giturlparse.parse('[email protected]:repo.git')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rczajka/.local/share/virtualenvs/vdeploy/lib/python3.6/site-packages/git_url_parse-1.0.2-py3.6.egg/giturlparse/__init__.py", line 34, in parse
return p.parse()
File "/home/rczajka/.local/share/virtualenvs/vdeploy/lib/python3.6/site-packages/git_url_parse-1.0.2-py3.6.egg/giturlparse/parser.py", line 104, in parse
raise ParserError(msg)
giturlparse.parser.ParserError: Invalid URL '[email protected]:repo.git'
Some systems, e.g. npm, support specifying a commit or branch by adding #commit at the end of the url; see
https://docs.npmjs.com/files/package.json#git-urls-as-dependencies
After this feature is implemented, one would expect
git clone https://github.com/retr0h/git-url-parse
cd git-url-parse
pip3 install --user -e .
python3 -c 'import giturlparse; print(giturlparse.parse("[email protected]:hi/there#rel-1.1"))'
to output e.g.
Parsed(pathname='hi/there', protocols=[], protocol='ssh', href='[email protected]:hi/there#rel-1.1', resource='gitlab.com', user='git', port=None, name='there', owner='hi', ref='rel-1.1')
or possibly
Parsed(pathname='hi/there', protocols=[], protocol='ssh', href='[email protected]:hi/there', resource='gitlab.com', user='git', port=None, name='there', owner='hi', ref='rel-1.1')
(because arguably href should be what one passes to 'git clone')
I am trying to parse this url:
https://github.com/tterranigma/Stouts.openvpn.git
This is what I am doing:
import giturlparse
name = 'https://github.com/tterranigma/Stouts.openvpn'
print(giturlparse.parse(name))
I get:
Parsed(pathname='//github.com/tterranigma/Stouts.openvpn.git', protocols=['https'], protocol='ssh', href='https://github.com/tterranigma/Stouts.openvpn.git', resource='https', user=None, port=None, name='Stouts.openvpn', owner='/github.com/tterranigma')
The owner field is wrong.
I tried with version 1.1 and it works correctly.
m2
is created even when only m1
is needed.
Better to store the regexes in an array, ordered most common to least common.
>>> p = giturlparse.parse("https://github.com/test/foo/")
>>> p
Parsed(pathname='foo/', protocols=['https'], protocol='ssh', href='https://github.com/test/foo/', resource='test', user=None, port=None, name=None, owner='foo')
>>> p = giturlparse.parse("https://github.com/test/foo")
>>> p
Parsed(pathname='/test/foo', protocols=['https'], protocol='https', href='https://github.com/test/foo', resource='github.com', user=None, port=None, name='foo', owner='test')
git allows local urls in it's path
Currently the parser will put some un-helpful values in the parsed object's tuples if this is the case
Possibly there can be some kind of check for an absolute path, so that in case a user passes a local repository, they get an empty return value
Reproducing code:
$ git clone https://github.com/retr0h/git-url-parse
$ cd git-url-parse
$ pip install -e .
$ python3
>>> import giturlparse
>>> p = giturlparse.parse("/path/to/local/repository/directory/")
>>> print(p)
What we get is :
>>> print(p)
Parsed(pathname='to/local', protocols=[], protocol='ssh', href='/path/to/local/repository/directory/', resource='path', user=None, port=None, name='local', owner='to')
Probably this can be improved, since the protocol='ssh' is highly misleading, and the name, local and resource can also be put to None, along with pathname and href being the same (or preferably href None)
>>> import giturlparse
>>> giturlparse.parse("https://github.com/sphinx-doc/sphinx.git")
Parsed(pathname='/sphinx-doc/sphinx.git', protocols=['https'], protocol='https', href='https://github.com/sphinx-doc/sphinx.git', resource='github.com', user=None, port=None, name='sphinx', owner='sphinx-doc')
>>> giturlparse.parse("https://github.com/sphinx-doc/sphinx")
Parsed(pathname='/sphinx-doc', protocols=['https'], protocol='https', href='https://github.com/sphinx-doc/sphinx', resource='github.com', user=None, port=None, name='sphinx-doc', owner=None)
Here is our issues:
I think it would be sensible to add a test job for Python 3.4, to be the minimum supported Python 3 version.
The library is supporting BitBucket's https git url style, i.e
https://[email protected]/virresh/gittest
But it has no test for it and runs a risk of breaking compatibility with this
A simple test to check this should be sufficient
When url format is:
https://gitlab.example.com/folder1/folder2/ansible-role-name.git
The owner
will be matched as /gitlab.example.com/folder1/folder2
.
https://
URLs without .git
are a ParserError
. GitHub and GitLab both support URLs without the .git
(but GitLab emits a warning).
Under version 1.2.0 I this URL which correctly specifies a branch, PROD, works fine:
In [1]: import giturlparse; giturlparse.parse('git+ssh://[email protected]_dept.uw.edu/uwmydept/my_repo.git@PROD')
Out[1]: Parsed(pathname='/uwmydept/my_repo.git', protocols=['git', 'ssh'], protocol='ssh', href='git+ssh://[email protected]_dept.uw.edu/uwmydept/my_repo.git@PROD', resource='gitlab.my_dept.uw.edu', user='git', port=None, name='my_repo', owner='uwmydept')
But under 1.2.1 or 1.2.2 it gets an error:
In [1]: import giturlparse; giturlparse.parse('git+ssh://[email protected]_dept.uw.edu/uwmydept/my_repo.git@PROD')
.....
~/.virtualenvs/my_dir/lib/python3.6/site-packages/giturlparse/parser.py in parse(self)
102 else:
103 msg = "Invalid URL '{}'".format(self._url)
--> 104 raise ParserError(msg)
105
106 return Parsed(**d)
ParserError: Invalid URL 'git+ssh://[email protected]_dept.uw.edu/uwmydept/my_repo.git@PROD'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.