Code Monkey home page Code Monkey logo

githubcloner's People

Contributors

0ca avatar clatze avatar danielhoherd avatar games647 avatar mazen160 avatar medwig avatar patrickauld avatar qkzk avatar soulmerge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

githubcloner's Issues

[!] Error: Github API rate limit exceeded

Hi,

i get "[!] Error: Github API rate limit exceeded". Is there an option to prevent it?

cmd was: githubcloner.py --org DevExpress-Examples -o E:\DevExpress

Regards
Martin

Inconsistent results for same user/org

I have seen several instances of inconsistent behavior when cloning the same user or org. I observed this while iterating on the API rate limit exceeded and the incomplete repository issues.

Here is the cycle I am using to test iterations:

cd
rm -rf "${HOME}/gitclone-test-"* "${HOME}/.mrconfig"
"${HOME}/code/GithubCloner/githubcloner.py" --user mazen160 -o "${HOME}/gitclone-test-$(date +%s)/"
find "${HOME}/gitclone-test"* -maxdepth 1 -mindepth 1 -type d | xargs -n1 mr register
mr status

I chose mazen160 for this example because it is a public data set within the author's control, but I have observed it with both organizations and other users.

When iterating with the above 7 times, I observed these results:

5  https://github.com/mazen160/Firefox-Security-Toolkit.git
5  https://github.com/mazen160/SecLists.git
5  https://github.com/mazen160/bfac.git
5  https://github.com/mazen160/ct-monitor.git
5  https://github.com/mazen160/struts-pwn_CVE-2017-9805.git
6  https://github.com/mazen160/Ubuntu-Desktop-Malware-Vector-Demo.git
6  https://github.com/mazen160/dirsearch.git
6  https://github.com/mazen160/dnsrecon.git
6  https://github.com/mazen160/public.git
6  https://github.com/mazen160/server-status_PWN.git
7  https://github.com/mazen160/GithubCloner.git
7  https://github.com/mazen160/ptf.git
7  https://github.com/mazen160/struts-pwn.git

Option to exclude some repos

Hello @mazen160 !

Thanks a lot for the script, I use it daily to archive my work.

I'm a teacher and I use Github Classroom for my students. It creates a copy of an assignement repo for each of them where
they push their work. It works fine and I can use some CI to validate tests, receive notifications etc.

Since I'm the owner of those said repos, they are archived by your script too... It's not what I want and I couldn't
find a way to exclude those repos.

So I forked your script and added the option. It's here.

The usage is quite simple :

python githubcloner.py ... --exclude_repos repo1,repo2,repo3...

If any string from this list is present in the url, it will be excluded.

I also formated the code to be a little bit more Pythonic (already told you I'm a teacher and it's a second nature...).

If you want I can make a PR.

Sometimes repository is not fully cloned

$ ${HOME}/code/GithubCloner/githubcloner.py --user mazen160 -o "${HOME}/gitclone-test-$(date +%s)/"
...snip...
$ cd ${HOME}/gitclone-test-1526924358/mazen160_SecLists
$ git status --porcelain | awk '{print $1}' | sort | uniq -c
   5 ??
 438 D
$ git reset --hard origin/master
Checking out files: 100% (438/438), done.
HEAD is now at 7bbc06c Added @mazen160 wordlist for common web API endpoints.
/Users/daniel.hoherd/gitclone-test-1526924358/mazen160_SecLists $ git status --porcelain | awk '{print $1}' | sort | uniq -c
/Users/daniel.hoherd/gitclone-test-1526924358/mazen160_SecLists $

I was previously investigating this bug when I hit the "API rate limit exceeded" issue, so this error is not related to those recent changes.

Output path not specified

I'm getting this on Windows. PyCharm works.

githubcloner.py --help
Error: The output path is not specified.
Exiting...

Any hints?

gists are always pulled

When using or when not using --include-gists, gists are always pulled.

I added debugging of userGists in a branch of my fork and that function is never being called, so gists are being populated by something else.

Some repository directories have truncated names

I have noticed some repositories with truncated names even though they have fully functioning remotes. Often this split involves the letter 'i'.

For instance:

  • githubcloner.py --user mazen160 gives dir mazen160_Firefox-Security-Toolk with origin https://github.com/mazen160/Firefox-Security-Toolkit.git
  • githubcloner.py --user danielhoherd gives dir danielhoherd_pre-commit-circlec with origin https://github.com/danielhoherd/pre-commit-circleci.git
  • githubcloner.py --org github gives dir github_puppet-ca_cer with origin https://github.com/github/puppet-ca_cert.git

Where is the output saved ?

So i ran this command :python3 githubcloner.py --org XXXX -o /output,however there folder Output is empty on my computer . I think i misunderstood the tool but i can't find any help online,where am i suppose to find the output results ?

`import queue' needs to be changed

After cloning the repo and attempting to run it, I ran into a few dependencies that needed to be installed. After installing argparse and PythonGit I had to change the import queue line, on line 21, to import Queue as queue.

TypeError: __init__() got an unexpected keyword argument 'daemon'

Fisrtly thanks for this tool, Mazen! :)

Run mac os :

Omars-MacBook-Air:GithubCloner omarkurt$ sudo python githubcloner.py --user omarkurt -o /omarkurt
Traceback (most recent call last):
  File "githubcloner.py", line 250, in <module>
    main()
  File "githubcloner.py", line 245, in main
    cloneBulkRepos(URLs, output_path, threads_limit=threads_limit)
  File "githubcloner.py", line 166, in cloneBulkRepos
    threading.Thread(target=cloneRepo, args=(URL, cloningPath,), daemon=True).start()
TypeError: __init__() got an unexpected keyword argument 'daemon'

Move from pythongit to a direct Git call

pythongit has been introducing a lot of issues with failing without printing any indications that the cloning failed in multiple occasions. It has been reported multiple times by users.

The best way to solve it is by having our own wrapper that directly uses Git.

--only-type (forks|sources)

It would be nice to be able to ignore forks and only download source repos. Perhaps we could have a --only_type source option? Not sure if this is intuitive with Github's API or not.

ModuleNotFoundError: No module named 'git'

Hi, sorry for being new to this. Trying to run your script, get this error:

Traceback (most recent call last):
File "./githubcloner.py", line 18, in
import git
ModuleNotFoundError: No module named 'git'

Output to user/repo instead of user_repo

Hi,

Thanks for this tool. Pretty handy. I think it'd be useful to have an option to download to user/repo. I think the prefix default doesn't make much sense, at least in my opinion. This would make it easier to mirror github and have the output what you'd expect. Otherwise, would have to do -o user/ instead of just -o . and specify all the users you want to clone.

Orgs no longer appear to be downloaded

Orgs no longer appear to be downloaded. This worked a few weeks ago. I have added logging about it into a branch in my fork: https://github.com/danielhoherd/GithubCloner/tree/logging

"renovo" is an org I am a part of that has many private repositories, yet none of them are listed when I use this tool as such:

$ LOGLEVEL=DEBUG /Users/dho/code/GithubCloner/githubcloner.py --org renovo -o "$HOME/github-clone-renovo-$(date +%s)" --include-authenticated-repos --authentication "danielhoherd:$GITHUB_API_TOKEN"
...snip...
githubcloner.py:150 DEBUG: fromOrg is beginning
connectionpool.py:824 DEBUG: Starting new HTTPS connection (1): api.github.com
connectionpool.py:396 DEBUG: https://api.github.com:443 "GET /orgs/renovo/repos?per_page=40000000&page=1 HTTP/1.1" 200 None
githubcloner.py:241 DEBUG: Response type is: <class 'list'>
connectionpool.py:824 DEBUG: Starting new HTTPS connection (1): api.github.com
connectionpool.py:396 DEBUG: https://api.github.com:443 "GET /orgs/renovo/repos?per_page=40000000&page=2 HTTP/1.1" 200 2
githubcloner.py:241 DEBUG: Response type is: <class 'list'>
githubcloner.py:171 DEBUG: fromOrg is returning URLs with length: 1
githubcloner.py:172 DEBUG: fromOrg returned URLs contain: ['git://github.com/renovo/hello-world-ci.git']
...snip...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.