Code Monkey home page Code Monkey logo

scrape-up's Introduction

πŸ•·οΈ Scrape Up

An alternative to APIs, Scrape Up is a Python package for web scraping. It allows you to extract data from platforms like GitHub, Twitter, Instagram, or any other website that contains valuable information. It enables you to gather data through programming.





License

License: MIT

Terms and conditions for use, reproduction, and distribution are under the MIT License.


Contribute to this project under CODEPEAK 2023

image

What is CODEPEAK? πŸ€”

CODE PEAK is a month-long program that helps students understand the paradigm of Open Source contribution and gives them real-world software development experience. The event targets first-timers who wish to participate in Free and Open Source(FOSS) Contributions and the experienced developers who want to show their skills by contributing to real-world projects.

Learn more about it here


Why Scrape Up? πŸ‘€

  • Flexible Scraping: Customize and define the specific data you want to extract from different platforms.
  • Easy-to-Use: Intuitive Python package interface for both beginners and experienced developers.
  • Multiple Platforms: Scrape data from various platforms, including GitHub, Twitter, Instagram, and more.
  • Efficient and Fast: Designed for efficient and reliable scraping of data from multiple sources.

How to use it? ✨

  1. Install the package using pip:
pip install scrape-up --upgrade
  1. Import the required module and instantiate an object with the necessary parameters:
# Import the required module
from scrape_up import github

# Instantiate an object with the username
user = github.Users(username="nikhil25803")
  1. Call the desired method to scrape the required information. For example, to extract the number of followers of a user:
# Call the followers method
followers_count = user.followers()

# Print the output
print(followers_count)

Output:

83
  1. Explore all the available methods provided by Scrape Up on different platforms here.

Happy scrapping! πŸ•ΈοΈ

The goal 🎯

In our project journey, we encountered several challenges, including requesting timeouts and rate limits. To overcome these limitations, we developed a powerful Python tool based on web scraping. Our goal is to provide an alternative to APIs for extracting data from various platforms, including GitHub, Twitter, Instagram, and any other website that contains valuable information. Here's what our project aims to achieve:

With our web-scraping-based Python tool, you can unlock a world of data and overcome the limitations often encountered when relying solely on APIs.

✨ Thank You for Your Contribution!

🌟 We value the time and effort you put into contributing, and we look forward to reviewing and merging your contributions. Together, let's make web scraping a powerful and accessible tool for extracting data from various platforms.

✨ Thank you for your contribution!


(Back to top)

scrape-up's People

Contributors

alwenpy avatar ayushanand308 avatar brohithkr avatar codesleep-beperfect avatar codingis4noobs2 avatar dyuthivivek avatar gayathrimaneksha avatar hereisswapnil avatar jaivsh avatar juhibhojani avatar kanishkasah20 avatar karthikbhandary2 avatar madmaverickminion avatar mahitej28 avatar mihan786chistie avatar neokd avatar nikhil25803 avatar nishitbaria avatar palavenkireddy avatar pooranjoyb avatar prady0t avatar ritik48 avatar roberanegussie avatar rubyseher avatar rudy3333 avatar samejima-san avatar surajdeotiwari avatar sushilverma002 avatar tanishkunigiri avatar whoisjayd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

scrape-up's Issues

Issues opened in a repository.

Proposed Method:

.issues_count()

Description

Create a method to get the number of issues opened in a repository.

To what class it belongs?:

Repository

Create test files.

As the methods of the Users class is done, I need someone to make a test file, which can test if all the methods as working correctly or not.

  • Create a test folder in /src
  • Within that directory, create a github_test.py.
  • Include all your script there, testing the methods that have been created till now.

Peoples in an organization.

Proposed Method

.peoples()

Description

Create a method to scrape the number of peoples who are the part of the orgnaization.

image

To what class it belongs?

Organization

Note

  • Make all the changes in the github/organization.py file.
  • Use a private method __scrape_page to scrape the page.
  • Include the method in the documentation.md file.
  • Upload a screenshot with the PR, demonstrating the method.

Improve contribution.md file.

As the folder structure of the project is now changed, you have to make minor changes in the contribution. md file.

Pinned Repository

We can add a method to retrieve the list of pinned repositories for a given user.

Labels given to a particular issue.

Proposed Method:

.labels()

Description

Returns a list of labels given to an issue.

image

To what class it belongs?:

Issue

Make changes at

src/scrape_up/github/issue.py

Reviewers requested for review in a PR

Proposed Method:

.reviewers()

Description

Get the list of reviewers assigned in a PR, if any.

To what class it belongs?:

PullRequest()

Note

Add the method in github/pull_request.py

Get all Pull requests of repository

Proposed Method: .get_pull_requests()

Description

scrape all the pull requests of given repository

To what class it belongs?: Repository

Example Usage

# Import the required module
from scrape_up import github

# Instantiate an object with username provided.
user =  github.Repository(username="Clueless-Community", repository="scrape-up")

# Cal the followers function
print(user.get_pull_requests())

Output :

Returns all the pull requests of given repository

Labels given to a pull request.

Proposed Method:

.labels()

Description

Returns the list of labels given in a PR, return an empty list of no labels are given.

To what class it belongs?:

PullRequest()

Note

Add the method in github/pull_request.py

Get avatar of a GitHub organization.

Proposed Method:

.avatar()

Description

Create a method to scrape the URL of a GitHub organization

To what class it belongs?:

Organization

Note

  • Make all the changes in the github/organization.py file.
  • Use a private method __scrape_page to scrape the page.
  • Include the method in the documentation.md file.
  • Upload a screenshot with the PR, demonstrating the method.

Name of the user opened an issue.

Proposed Method:

.opened_by()

Description

Get the name of the user who opened the particular issue.

To what class it belongs?:

Issue

Make changes at

src/scrape_up/github/issue.py

Files changed in a pull request.

Proposed Method:

.files_changed()

Description

Get the number of files changed in a PR made.

To what class it belongs?:

PullRequest()

Note

Add the method in github/pull_request.py

Stars count of a repository.

Proposed Method:

.stars_count()

Description

Create a method to get the number of stars, a repository has.

To what class it belongs?:

Repository

Number of followers of an organization.

Proposed Method

.followers()

Description

Create a method to scrape the number of followers an organization has.

To what class it belongs?

Organization

Note

  • Make all the changes in the github/organization.py file.
  • Use a private method __scrape_page to scrape the page.
  • Include the method in the documentation.md file.
  • Upload a screenshot with the PR, demonstrating the method.

Commits made in a PR.

Proposed Method:

.commits()

Description

Get the

To what class it belongs?:

PullRequest()

Note

Add the method in github/pull_request.py

Commits made in a repository.

Proposed Method:

.commits()

Description

Create a method to get the number of commits made in a repository.

To what class it belongs?:

Repository

Make an issue and PR template for this repository.

We need to make an issue and pull a request template for this project. Here are a few things which we need to include

Issue template

  • What method and you want to include?
  • To what class it belongs?

PR Templates

  • Some checkpoints
    • Have you followed the code convention?
    • Have you included the method/function in the documentation.md file?

This is just a brief, you can include relevant points on your own. But please do not dump any random template. Keep it short and precise.

Get a list of all starred repositories of a user

Proposed Method: get_starred_repos()

Description

With the help of get_starred_repos(), one can easily scrape out the list of starred repos of any user.

To what class it belongs?: Users

Example Usage

from scrape_up import github
user =  github.Users(username="nikhil25803")
print(user.get_starred_repos())

# Output - Starts listing all starred repos.......

I want to work on this issue under JWOC.

Get fork count of a repository.

Proposed Method: .

fork_count()

Description

Create a method that returns the fork count of a repository.

To what class it belongs?:

Repository

Regarding JWOC contribution

Hello, i am a python developer and i like to work on automation and scripting. I want to contribute to this project.

Title of an issue.

Proposed Method:

.title()

Description

Get the title of an issue.

To what class it belongs?:

Issue

Make changes at

src/scrape_up/github/issue.py

Get Repositories of user

Adding a feature to fetch all repositories along with used tech stack for it. I want to work on this issue under JWOC

Get Followers List

Proposed Method: get_followers_list()

Description

returns list of followers for given github username

To what class it belongs?: Users

Example Usage

# Import the required module
from scrape_up import github

# Instantiate an object with username provided.
user =  github.Users(username="nikhil25803")

# Cal the followers function
print(user.get_followers_list())

# Output: Returns List of Followers

Please assign this issue to me under JWOC

Deploy the Website of this repo

Is your feature request related to a problem? Please describe.
I can see that the website related to this repo is not deployed anywhere.
Describe the solution you'd like
Deploy the website on cloud using Render since Heroku is now paid.

  • Assign this issue to me under JWOC

List of languages used in a repository.

Proposed Method:

.languages()

Description

Create a method to get the list of all the languages used in a repository.

To what class it belongs?:

Repository

Scrape the tags of a repository.

Proposed Method:

.tags()

Description

Create a method to get the tags of a repository, that is the title tags.

To what class it belongs?:

Repository

Topics mentioned in a repository.

Proposed Method:

.tags()

Description

Create a method to get the list of tags used in a repository.

To what class it belongs?:

Repository

Add method for getting readme

Is your feature request related to a problem? Please describe.
We may need to have readme file for a user

Describe the solution you'd like
Scrape the users profile and get its readme and save it to directory of the user.

Describe alternatives you've considered
No alternatives

Additional context

Get the main tech stack used for each repository

Proposed Method: get_techstack_repo()

Description

with the help of get_techstack_repo(), one can easily know main tech stack used for each repository of given user, in which it helps to pick repositories to contribute based on Tech Stack we are interested in.

To what class it belongs?: Users

Example Usage

from scrape_up import github
user =  github.Users(username="PalaVenkiReddy")

# Call the techstack function
print(user.get_techstack_repo())

# Output - {repo_name : tech_stack_used , . . . . . . . }

Time when the issue was open.

Proposed Method:

.opened_at()

Description

Returns a string containing the time range when the issue was opened, like "yesterday", "last week", etc.

You can find this at
image

To what class it belongs?:

Issue

Make changes at

src/scrape_up/github/issue.py

Get User Bio

Adding a feature, so that the program also fetches user bio. I want to work on this issue under JWOC

Scrape the list of issues in a particular repository

Proposed Method: .get_issues()

Description

Scrape the list of issues open in a repository.

To what class it belongs?: Repository

Example Usage

from scrape_up import github
repo =  github.Repository(username="Clueless-Community", repository="scrape-up")
print(repo.get_issues())

Output :

Returns a list of all open issues in the repository.

Hey @nikhil25803 I would like to work in this issue under DWOC '23 :) πŸ‘

Scrape the list of contributors of a repository

Proposed Method: .get_contributors()

Description

.get_contributors() will scrape out the contributors of a repository.

To what class it belongs?: Repository

Example Usage

from scrape_up import github
repo =  github.Repository(username="Clueless-Community", repository="scrape-up")
print(repo.get_contributors())

# Output - Starts listing all contributors......

I want to work on this issue under JWOC.

Pull request count.

Proposed Method:

.pull_requests()

Description

Create a method to get the number of pull requests opened in a repository.

To what class it belongs?:

Repository

Number of stars and repositories of an user.

  • Create a method with the Users class to fetch the number of repositories and star count of a user.

  • For example, 45 repo count and 68 stars count as shown below for the user nikhil25803

image

minor typo issue

I have found a typo error while running the commands stated for demo. actually class name is different from written one, so the command is not working properly. pls assign me this issue

Get the list of assignees assigned in an issue.

Proposed Method:

.assignees()

Description

Returns a list of users, who are assigned to an issue. Else returns an empty list.

To what class it belongs?:

Issue

Make changes at

src/scrape_up/github/issue.py

README.md needs fixing

README.md needs fixing in the "How to use?" section. The link to the provided methods is broken and doesn't open "documentation.md".

Instagram

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Get title of a pull request.

Proposed Method:

.title()

Description

Get the title of a pull request.

To what class it belongs?:

PullRequest()

Note

Add the method in github/pull_request.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.