Code Monkey home page Code Monkey logo

archive's Introduction

archive

About

This is the repository for a minimalist, dockerized application that archives websites into the Internet Archive's Wayback Machine and saves a copy to the local machine - which can be pushed to the repository itself, and therefore is also serving as a web-based personal archive.

App

The app is composed by a main archive (bash) script and a companion savepage (python) script.

archive takes three positional arguments:

  • $1: URL to be archived

  • $2: name of the folder to save the webpage locally

  • $3: prefix of web-based personal archive

The script submits the URL to the Wayback Machine via waybackpy, saves a local copy via savepage, saves a log of shell outputs (to outputs.txt) and adds information to a series of lists (original.lst, folder.lst, wayback.lst, and archive.lst) to thereafter create/add to a summary table.

savepage is the companion script that handles the download of the webpage, extracts metadata and ouputs a metadata.txt file to the folder of interest.

๐Ÿณ Dockerfile

Usage example:

  1. Pull the image from dockerhub, with docker pull jlnetosci/archive:v0.1.0
  2. Use docker run -v <path/to/local/archive>:/usr/local/etc jlnetosci/archive:v0.1.0 archive <url_to_save> <folder_name> <prefix_of_personal_web_archive> to run the application.

Update README.md

The summary table in README.md is updated using the summary_table.R coupled with the docker image jlnetosci/r-minimal-knitr:1.43. Usage example: docker run --rm -v <path/to/local/archive>:/root jlnetosci/r-minimal-knitr:1.43 Rscript summary_table.R

Summary table

original wayback page
https://www.freecodecamp.org/news/how-to-dual-boot-windows-10-and-ubuntu-linux-dual-booting-tutorial https://web.archive.org/web/20230703192448/https://www.freecodecamp.org/news/how-to-dual-boot-windows-10-and-ubuntu-linux-dual-booting-tutorial https://raw.githack.com/jlnetosci/archive/main/pages/ubuntu_dual_boot/how-to-dual-boot-windows-10-and-ubuntu-linux-dual-booting-tutorial.html
https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html https://web.archive.org/web/20230705131451/https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html https://raw.githack.com/jlnetosci/archive/main/pages/RNAseq_tutorial/DE.html
GeneralMills/pytrends#550 https://web.archive.org/web/20230710101645/https://github.com/GeneralMills/pytrends/issues/550 https://raw.githack.com/jlnetosci/archive/main/pages/google_trends_issues/550.html
https://stackoverflow.com/questions/43661251/how-to-manually-change-text-color-of-ggplot2-legend-in-r https://web.archive.org/web/20230711221055/https://stackoverflow.com/questions/43661251/how-to-manually-change-text-color-of-ggplot2-legend-in-r https://raw.githack.com/jlnetosci/archive/main/pages/ggplot2_legend_text_color/how-to-manually-change-text-color-of-ggplot2-legend-in-r.html
https://rmisstastic.netlify.app/how-to/python/generate_html/how%20to%20generate%20missing%20values https://web.archive.org/web/20230718105110/https://rmisstastic.netlify.app/how-to/python/generate_html/how%20to%20generate%20missing%20values https://raw.githack.com/jlnetosci/archive/main/pages/missing_values_python/how%20to%20generate%20missing%20values.html
https://www.biostars.org/p/9528226/ https://web.archive.org/web/20230725081931/https://www.biostars.org/p/9528226/ https://raw.githack.com/jlnetosci/archive/main/pages/public_bioinformatics_servers/index.html
https://discourse.jupyter.org/t/mybinder-and-multiprocessing/3238 https://web.archive.org/web/20230725165857/https://discourse.jupyter.org/t/mybinder-and-multiprocessing/3238 https://raw.githack.com/jlnetosci/archive/main/pages/mybinder_multiprocessing/3238.html
https://mybinder.readthedocs.io/en/latest/about/user-guidelines.html https://web.archive.org/web/20230725170057/https://mybinder.readthedocs.io/en/latest/about/user-guidelines.html https://raw.githack.com/jlnetosci/archive/main/pages/mybinder_guidelines/user-guidelines.html
https://math.stackexchange.com/questions/3310277/how-to-calculate-cumulative-s-d https://web.archive.org/web/20230727155624/https://math.stackexchange.com/questions/3310277/how-to-calculate-cumulative-s-d https://raw.githack.com/jlnetosci/archive/main/pages/cumulative_standard_deviation/how-to-calculate-cumulative-s-d.html
http://scholarpedia.org/article/K-nearest_neighbor https://web.archive.org/web/20230728182404/http://scholarpedia.org/article/K-nearest_neighbor https://raw.githack.com/jlnetosci/archive/main/pages/knn_datasets/K-nearest_neighbor.html
https://komodor.com/learn/git-errors/ https://web.archive.org/web/20230802122746/https://komodor.com/learn/git-errors/ https://raw.githack.com/jlnetosci/archive/main/pages/git_errors/index.html
https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data https://web.archive.org/web/20230803141338/https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data https://raw.githack.com/jlnetosci/archive/main/pages/compress_pickle/whats-the-most-space-efficient-way-to-compress-serialized-python-data.html
https://ctan.org/tex-archive/fonts/atkinson https://web.archive.org/web/20230804195452/https://ctan.org/tex-archive/fonts/atkinson https://raw.githack.com/jlnetosci/archive/main/pages/atkinson_hyperlegible_tabular/atkinson.html

archive's People

Contributors

jlnetosci avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.