Code Monkey home page Code Monkey logo

archivebinge's Introduction

Archive Binge

AB was a webcomic aggregator and reader. As the original developer is no longer able to work on the project, the source code is made available here for use, reproduction, modification, display, distribution, and community contribution. Please see the license for more details on what you may do with this source code. If you use this code, you must provide access to the source code, whether it be linking to this repo (if unmodified), or linking to your own public repo.

Requirements

  • PHP 7.3+
  • Python 2.7
  • MySQL (Preferably MariaDB 10.2+)

Installation

git clone [email protected]:Respheal/archivebinge.git
cd ./archivebinge/
sudo apt-get install python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
python -m virtualenv ./crawler/crawlerenv
source ./crawler/crawlerenv/bin/activate
pip install -U pip
pip install -r ./crawler/requirements.txt

Before use, you should update the following strings throughout the codebase. For the frontend site, these variables are currently managed in includes/conf.inc.php, but may need to be updated in Python scripts

Any instance of:

  • '/full/path/to/archivebinge/crawler/crawlerenv/' should be updated to the virtual env created above
  • DATABASE_HOST should be your database host (probably 'localhost')
  • DATABASE_USER should be your database user
  • DATABASE_PASSWORD should be your database password
  • DATABASE_NAME should be your database name
  • SECRET_KEY should be a unique key used for encryption

Files to rename:

mv ./crawler/minisup.sample.py ./crawler/minisup.py
mv ./crawler/supervisor.sample.py ./crawler/supervisor.py
mv ./includes/conf.inc.sample.php ./includes/conf.inc.php
mv ./includes/tos.inc.sample.php ./includes/tos.inc.php
mv ./includes/privacy.inc.sample.php ./includes/privacy.inc.php

In includes/conf.inc.php, update the SUPPORT_EMAIL, FEEDBACK_EMAIL, and ABUSE_EMAIL variables to your contact information.

To use the social media logins, you will need to configure their OAuth settings in includes/conf.inc.php:

Facebook:
See: https://developers.facebook.com/docs/facebook-login/web/

Twitter:
See: https://developer.twitter.com/en/docs/authentication/guides/log-in-with-twitter

Google:
See: https://developers.google.com/identity/protocols/oauth2

Lastly, although you may create a database yourself to your own specifications, I've included a dump of an empty database which you may import: ./mysql_dump/ab_database.sql

Usage Notes

  1. Whichever user has an ID of 1 in the database is the admin user.
  2. I make absolutely no promises about the functionality, readability, usability of any code. Use at your own risk.
  3. Not all files included are necessary for functionality (see: some tutorial files that got left in)
  4. Updates to this repo may be pushed to archivebinge.com and are available for use in derivative sites and/or repos

Crons

In order to collect comic updates, AB relies on two crons:

23,53 * * * * cd /path/to/public_html/crawler; ./minisup.py 2>> /path/to/public_html/crawler/minisuplog
*/15 * * * * cd /path/to/public_html/crawler; ./supervisor.py 2>> /path/to/public_html/crawler/supervisorlog

minisup.py collects updates for existing comics. supervisor.py collects updates for newly-added comics. You may set them to run at whatever intervals you like. Do check on them occasionally though--some comics may trigger an infinite-crawl bug, resulting in multiple processes, which may result in server resource overages.

Crawlers

All scrapy crawlers are stored under ./crawler/archivebinger/spiders. You can run the spiders manually like so:

.crawler/crawlerenv/bin/scrapy crawl typefinder -a starturl='https://comic.com/first-page' -a secondurl='https://comic.com/second-page' -a cid='crawldata.json'

This spider will find a reference to the second page on the first page of a comic and note that for future crawling. The output file, crawldata.json, contains variables used in the following:

.crawler/crawlerenv/bin/scrapy crawl superbinge -a starturl="https://comic.com/any-page" -a position="inner" -a tag="rel" -a identifier="next"

This will launch a crawler through all of the pages of the referenced comic.

License

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/

archivebinge's People

Contributors

eishiya avatar respheal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.