Code Monkey home page Code Monkey logo

mangascraper's Introduction

Manga/Manhua Scraper

  • Download your favorite Webtoons.
  • Search between varoius websites.
  • Merge downloaded Webtoons into one or two images.
  • Convert downloaded Webtoons into PDF file.
  • Search and find what you want.
  • Download full database of a website.
  • find sauce of an image.

Table of Contents

Setup

  • After cloning the repository use pip install -r requirements.txt to install requirements.
  • List of implemented modules is available in modules.yaml file.

Command line interface

Command center gives you various options like:

  • download a single manga/manhua/doujin or multiple.
  • automatically merge them and convert them into pdf using -m.
  • if you also set -fit, images will get merged and resized so the overall width will be equal and no white space will abe added to final images.
  • change the time sleep between each request.
  • merge images of a single folder or subfolders of a folder.
  • convert images of a single folder or subfolders of a folder to pdf file.
  • search in websites using implemented modules.
  • set the chapter numbers to download when downloading a single manga.
  • you can use -t argument to set the sleep time between each request. the default is 0.1 sec.

Modules

There are various modules implemented so far. They inherit from models.
They're implemented differently based on how the website is develpoed.
In case if using custom user agents or cookies are required, sending requests to the webiste is done dirctly by the module itself.
To use them, they're loaded from modules.yaml file in modules_contributer.py and can be accesed by get_modules function.

Download a single manga

When downloading a single manga using manga -single, a module and a url should be provided.
You can specify which chapters to download using [-l, -r, -c] arguments.
By default all chapters will be downloaded.
Name of the Manga and merging args are optional.

Examples:

  • all chapters: python cli.py manga -single 11643-attack-on-titan -s mangapark.to
  • chapters after a certian chapter: python cli.py manga -single secret-class -s manhuscan.us -l 52
  • chapters between two chapters: python cli.py manga -single secret-class -s manhuscan.us -r 20 30
  • specify chapters: python cli.py manga -single secret-class -s manhuscan.us -c 5 10 36
  • e.g. python cli.py manga -single secret-class -s manhuscan.us -n "Secret Class" -m -p

Download mangas of a file

Let's say you read a couple of mangas that are updated on weekly basis and you want to download all new chapters, then you should go with -file option.
When downloading more than one manga using manga_file.py you should specify name of a json file.
Json file will be automatically updated after each chapter is downloaded.
Example: python cli.py manga -file mangas.json
Format of the json file should look like this:

{
    "Attck on Titan": {
        "include": true,
        "domain": "mangapark.to",
        "url": "11643-attack-on-titan",
        "last_downloaded_chapter": null,
        "chapters": []
    },
    "Secret Class": {
        "include": true,
        "domain": "manhuascan.us",
        "url": "secret-class",
        "last_downloaded_chapter": {
            "url": "chapter-100",
            "name": "Chapter 100"
        },
        "chapters": []
    }
}
  • if the "last_downloaded_chapter" is null, all of the chapters will be added to the download list.
  • if the "last_downloaded_chapter" has valid value, it will automatically add the chapters after "last_downloaded_chapter" to the download list.

Download a doujin by it's code

You can download a doujin from an implemented module just by entering its code.
Note: You can use -code or -single argument to download a single doujin (it doesn't matter).
Example: python cli.py doujin -code 000000 -s hentaifox.com

Download doujins of a file

If you have a couple of codes and want to download all of them at once you can put them in a json file like the one down below and use -file option.
When downloading more than one doujin using doujin_file.py you should specify name of a json file.
Json file will be automatically updated after each doujin is downloaded.
Example: python cli.py doujin -file doujins.json
Format of the json file should look like this:

{
    "nyahentai.red": [
        999999,
        999998
    ],
    "hentaifox.com": [
        999997,
        999996
    ]
}

Image merger

You can merge all chapters of a manga or any folder that has images in it vertically.
before starting the merge process, all the images will be validated to avoid any exception.
Examples:

  • mrege an entire manga: python cli.py merge -bulk "One Piece"
  • mrege a folder: python cli.py merge -folder "path/to/folder"
  • mrege a folder and resize it: python cli.py merge -folder "path/to/folder" -fit

PDF converter

You can also convert the chapters to PDF to read them better.
before starting the merge process, all the images will be validated to avoid any exception.
converting chapters that are merged into fewer images is highly recommended.
Examples:

  • convert an entire manga: python cli.py c2pdf -bulk "One Piece"
  • convert a folder: python cli.py convert -folder "path/to/folder" -n "pdf_name.pdf"

Search engine

allows you to search between available modules that searching function is implemented for them.
unlike downloading with -single argument you can specify multiple modules when using -s.
if you don't use -s, all modules will be searched.
set page limit with -page-limit argument.
you can limit the results with setting -absoulte argument.
Examples:

  • search in one module: python cli.py search -s manhuascan.us -n "secret"
  • search in multiple modules: python cli.py search -s mangapark.to manga68.com -n "secret"
  • search in all modules: python cli.py search -n "secret"
  • e.g. python cli.py search -s manhuascan.us -n "secret" -page-limit 5 -absolute -t 1

Database crawler

allows you to download databse of modules that get_db function is implemented for them.
you can only get database of one module at a time.
result of the crawling will be saved to a json file with module name on it.
Examples: python cli.py db -s manhuascan.us

Saucer

if you don't know sauce of an image you can find it using saucer Examples:

  • find the sauce using an image file: python cli.py sauce -image "path/to/image"
  • find the sauce using url of an image: python cli.py sauce -url "url/to/image"

Modules Checker

to check if a module is functional or not you can use check option.
if you don't use -s, all modules will be checked.
Examples:

  • check one module: python cli.py check -s manhuascan.us
  • check all modules: python cli.py check

mangascraper's People

Contributors

nekochan0122 avatar yofagh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mangascraper's Issues

Merged image skipped the last dozens images in a folder

I tried to use python cli.py merge -folder "path_to_images_folder" -fit
It properly merged the first hundred or so images into two separate merged image but the last remaining images is not merged and merely left out. Can you help me with this?

SyntaxError: f-string: unmatched '('

D:\Github\Merge-image>python cli.py merge -folder "Merged\CH (1)" -fit
Traceback (most recent call last):
File "D:\Github\Merge-image\cli.py", line 86, in
from utils.image_merger import merge_folder
File "D:\Github\Merge-image\utils\image_merger.py", line 42
copy2(list_to_merge[0].filename, f'{path_to_destination}/{index+1:03d}.{list_to_merge[0].filename.split('.')[-1]}')
^
SyntaxError: f-string: unmatched '('

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.