Code Monkey home page Code Monkey logo

githubstargazerscrawler's Introduction

GitHubStargazersCrawler

A simple crawler to get the user profile of all stargazers of the specified GitHub repository via GitHub API. The crawler persists the data in a SQLite database

Usage

  1. Install the requirements: pip install -r requirements.txt
  2. Create a GitHub API token and store it as an environment variable: export GITHUB_TOKEN=<your token>
  3. Initialize the database: python init_db.py
  4. Run the crawler: python crawler.py <repository>, for example python3 crawler.py kuzudb/kuzu

Notes

  • If the crawler is stopped, the current state will be saved in the SQLite database. It is possible to run the crawler multiple times for the same repository. It will automatically skip already crawled users and continue from the last saved state.
  • The crawler will also automatically handle the GitHub API rate limit. If the rate limit is reached, the crawler will wait until the limit is reset and then continue the crawling process.
  • It is possible to run the crawler for multiple repositories over the same database.
  • It is possible to run the crawler without a GitHub API token, but the rate limit is much lower. Also, some information, such as the email address, will not be available without a token.
  • The crawler will try to crawl each user only once. If an error occurs during crawling, the request will not be repeated. However, when restarting the crawler, the skipped users will be crawled again.

githubstargazerscrawler's People

Contributors

mewim avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.