Code Monkey home page Code Monkey logo

cbr2pdf's Introduction

cbr2pdf

cbr2pdf is a bash script will convert all .cbr and .cbz files recursively from a folder to PDF files in a seperate folder with pretty colors and stats. This script mainly uses ImageMagick to convert the images to pdf files and 7zip/p7z to extract the archives.

TODO

  1. Find alternatives to p7zip as they seem to operate differently across different distro. E.g macOS version of p7zip doesn't have any issues with one particular archive, but p7zip in Fedora cannot extract the exact same archive. On Xubuntu, the archive can be extracted but the files are corrupted. Interestingly, the command 7za and 7z are different, the later of which is able to extract the archive.
  2. Find a way to use img2pdf instead of ImageMagick
  3. Rewrite to python? (See above)
  4. Ensure that this script runs on different distros, mainly BSD and other linux distros like Fedora, CentOS, etc

Performance

Recently, I've added a dodgy way of running the script in parallel. To see if parallelisation helps, see the performance page.

TL:DR - Script runs very well when runnning with 2 parallels, slower with 4 parallels and having the spinner enabled slows down the script tremendously.

Installation

Git

$ git clone https://github.com/Julian-Heng/cbr2pdf.git
$ cd cbr2pdf
$ ./cbr2pdf.sh

Curl

$ curl https://raw.githubusercontent.com/Julian-Heng/cbr2pdf/master/cbr2pdf.sh > cbr2pdf.sh
$ chmod +x cbr2pdf.sh
$ ./cbr2pdf.sh

Dependencies

The main commands used in this script are 7z and ImageMagick, but also include commands from the GNU Core Utils like sort, basename and printf. So do keep that in mind. But if you're just running Ubuntu, or Arch Linux or any kind if linux, you should be fine.

The script also relies on bash-4.4 (September 2016) or above.

For MacOS, you'll need homebrew to install ImageMagick and 7zip. It will also install the Xcode Commandline tools, which includes git. Curl is also not installed by default.

Installing Dependencies

Ubuntu/Debian based

$ sudo apt install p7zip-full imagemagick

Arch based

$ sudo pacman -S p7zip imagemagick

Fedora

$ sudo dnf install p7zip ImageMagick

openSUSE

$ sudo zypper install p7zip ImageMagick

FreeBSD

$ sudo pkg install p7zip imagemagick

macOS

$ brew install p7zip imagemagick

Usage

$ ./cbr2pdf.sh --option --option VALUE

Help Output

Usage:  ./cbr2pdf.sh --option --option VALUE

    Options:

    [-v|--verbose]          Enable verbose output
    [-x|--extract]          Only extract files
    [-h|--help]         Displays this message
    [-k|--keep]         Keep extracted files
    [-q|--quiet]            Suppress all output
    [-p|--parallel "VALUE"]     Run in parallel
    [-l|--loglevel "VALUE"]     Determine level of output details
    [-w|--overwrite]        Overwrite existing files
    [-i|--input "DIRECTORY"]    The input path for the files
    [-o|--output "DIRECTORY"]   The output path for the converted files
    [--version]         Print version number
    [--no-spinner]          Disable the spinner
    [--no-summary]          Disable printing summary (still print failed)
    [--no-color]            Disable color output
    [--no-list]         Disable printing file listing

    This bash script convert all comic book archives with the
    file extension .cbr or .cbz recursively from a folder
    to PDF files in a seperate folder. It can also do single
    files.

    This script mainly uses ImageMagick to convert the images
    to pdf files and 7zip/p7z to extract the archives.

    Made by Julian Heng

[!] Both folders must already exist before starting this script

Sample Output

┌[julian@Julians-MacBook-Pro]-(~)
└> ./cbr2pdf.sh -i ~/Input -o ~/Output
================================================
[!] File list
================================================
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 01 (of 04) (2010).cbz
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 02 (of 04) (2010).cbz
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 03 (of 04) (2010).cbz
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 04 (of 04) (2010).cbz

================================================
[!] File information
================================================
Job Number:		1/4
Output Directory:	/Users/julian/Output/Input/(2010) The Transformers - Drift [#1-4]
Source File:		/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 01 (of 04) (2010).cbz

[!] Extracting archive...
[!] No subfolders detected...
[!] Converting to PDF...
[!] Deleting extracted files...
[!] Finish converting "The Transformers - Drift 01 (of 04) (2010).cbz"

================================================
[!] File information
================================================
Job Number:		2/4
Output Directory:	/Users/julian/Output/Input/(2010) The Transformers - Drift [#1-4]
Source File:		/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 02 (of 04) (2010).cbz

[!] Extracting archive...
[!] No subfolders detected...
[!] Converting to PDF...
[!] Deleting extracted files...
[!] Finish converting "The Transformers - Drift 02 (of 04) (2010).cbz"

================================================
[!] File information
================================================
Job Number:		3/4
Output Directory:	/Users/julian/Output/Input/(2010) The Transformers - Drift [#1-4]
Source File:		/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 03 (of 04) (2010).cbz

[!] Extracting archive...
[!] No subfolders detected...
[!] Converting to PDF...
[!] Deleting extracted files...
[!] Finish converting "The Transformers - Drift 03 (of 04) (2010).cbz"

================================================
[!] File information
================================================
Job Number:		4/4
Output Directory:	/Users/julian/Output/Input/(2010) The Transformers - Drift [#1-4]
Source File:		/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 04 (of 04) (2010).cbz

[!] Extracting archive...
[!] No subfolders detected...
[!] Converting to PDF...
[!] Deleting extracted files...
[!] Finish converting "The Transformers - Drift 04 (of 04) (2010).cbz"

================================================
[!] Completed files
================================================
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 01 (of 04) (2010).cbz
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 02 (of 04) (2010).cbz
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 03 (of 04) (2010).cbz
/Users/julian/Input/(2010) The Transformers - Drift [#1-4]/The Transformers - Drift 04 (of 04) (2010).cbz

================================================
[!] Finish converting all files
================================================

┌[julian@Julians-MacBook-Pro]-(~)
└>

Exit Codes

  • 0 - Finished successfully
  • 1 - Script was interrupted
  • 2 - Unknown flags or no flags parsed
  • 3 - Input/Output directory not valid
  • 4 - 7z/unzip or ImageMagick not installed
  • 5 - Wrong bash version

Process

Simplified

Basically there are 6 steps that the script performs

  1. List all files within the input directory
  2. Create the same folder structure as the input directory into the output directory
  3. Extract using 7z or unzip to the output directory
  4. Convert using ImageMagick from all .jpg or .png to .pdf
  5. Delete extracted files
  6. Loop until all files are done

Advance

Prerun

Firstly, the script will go and run all the prechecks before running the main script. This involves getting all arguments, printing verbose and debug information, checking directories and checking applications before continuing.

Setting Variables

Using the find command, we create an array containing all the files to be converted, which is then sorted. That array is then fed into a while loop as a the variable inputFile. This variable is then seperated into parent, source_dir, source_file, source_filename, source_ext, output.

  • parent: Parent directory
  • source_dir: Source directory
  • source_file: Source file
  • source_filename: Source filename
  • source_ext: Source file extension
  • output: Destination directory

By doing so, it makes it easier to form the folder structure on the output directory, as well as detecting file type for filtering out files that isnt .cbr or .cbz.

Extracting Files

After setting the variables, we can now extract the files from the input directory into the output directory. 7z is used for extracting because unzip is unable to extract rar archives. unzip however, is used as a fallback if 7z is not present. The extract function uses the variables inputFile and output/source_file and to extract the files into the output directory within a folder with the name as the inputFile.

Checking for Subfolders

Sometimes, the images are within a folder after extraction. This detects if there is a subfolder after extraction and moves the files up one level by using the find command for any directories within a max depth of 2. If there is a subfolder within the extracted directory, then the find command will print out 2 lines, first is the output directory and the second is the subfolder itself. Using sed, we can assign the variable check to the second line of the command. A simple check is then perform to see if the check variable is empty. If it is, then there's no subfolder and the script continues. If it isn't empty, then the files inside of the subfolder is brought up one level using mv.

Checking for case-sensitive extensions

Due to case-sensitive conditionals in bash, the extracted directory is checked for file extensions which are either all uppercase (JPG) or camelCase (Jpg). This ensures that when converting, the script will not choke when encountering file extensions that isn't all lowercase.

Converting

Just like in the extract function, we used convert inside of a function where we pass the arguments output/source_file/*.{jpg,png} and output/source_filename.pdf. The first variable is all images inside of the extracted folder and the second variable is the final converted file.

Deleting

After converting the files, the extracted folder is then deleted and then moved on the the next file.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details

cbr2pdf's People

Contributors

julian-heng avatar salazarbarrera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cbr2pdf's Issues

NOT this project's problem, but some problem and the solution

this script is perfect, thanks. but the tools depend on give me some trouble need iron out. i dont know how to leave a commt or what every on github, so i did this. wish can help other people.

error: insufficient image data in file

convert-im6.q16: insufficient image data in file `/home/peter/backup/Avatar_The_Last_Airbender_commic/pdf_total/operation/Avatar - The Last Airbender (08) - Imbalance Part 01 (2018) (digital) (Son of Ultron-Empire)/Avatar- The Last Airbender - Imbalance Part One-064.jpg' @ error/jpeg.c/ReadJPEGImage_/1166

path of debug

function: convert_file

in function  convert_file  try to print what cmd was executed, try to execute in bash

get output

convert-im6.q16: unable to open image `/home/peter/backup/Avatar_The_Last_Airbender_commic/pdf_total/operation/Avatar_The_Last_Airbender_02_The_Lost_Adventures/*.jpg': No such file or directory @ error/blob.c/OpenBlob/2924.

check the image file folder

-rw-rw-r-- 1 peter peter 0 Dec  1  2018 'Avatar- The Last Airbender - Imbalance Part One-052.jpg'

all *jpg have size of zero, guess the problem is the extract image part.

function: extract

same method try to echo execute command, see whats going on 

for RAR, use 7z x

ERROR: Unsupported Method : Avatar- The Last Airbender - Imbalance Part One-070.jpg

serch for result:Failing to unrar files

truns out you need non-free 7z-rar module

sudo apt-get install -y p7zip-rar; #  ubuntu 22.04, maybe different for other distro

now 7z x test run successfully output image. but new error is given.

convert-im6.q16: cache resources exhausted `/home/peter/backup/Avatar_The_Last_Airbender_commic/pdf_total/operation/Avatar - The Last Airbender - The Search Part 2 (2013) (digital) (Son of Ultron II-Empire)/Avatar - The Last Airbender - The Search Part 2-034.jpg' @ error/cache.c/OpenPixelCache/4095.

error:  cache resources exhausted

ok, back to convert_file function

try to run 

convert *.jpg -density 100  test.pdf

give same error

convert-im6.q16: cache resources exhausted `Avatar - The Last Airbender - The Search Part 3-034.jpg' @ error/cache.c/OpenPixelCache/4095.

find the solution cache resources exhausted Imagemagick

increate the cache size 

try to find (ps i think is NOT nessarry)

sudo find / -name "policy.xml" 2>/dev/null

edit the (finded) configure file  `/etc/ImageMagick-6/policy.xml`

change

<policy domain="resource" name="disk" value="1GiB"/>

to a bigger value

<policy domain="resource" name="disk" value="9091GiB"/>

try to exeute convert *.jpg -density 100 test.pdf

give a new error

convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/421.

error: attempt to perform an operation not allowed by the security policy `PDF'

ImageMagick security policy 'PDF' blocking conversion

edit line like 

<policy domain="coder" rights="none" pattern="PDF" />

to 

<policy domain="coder" rights="read | write" pattern="PDF" />

test run, output normal pdf. everything work.

Step-by-step guide?

Can we potentially get a step-by-step guide on how to use this? currently working via Mac terminal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.