Code Monkey home page Code Monkey logo

mementoembed's Introduction

Build Status Doc Status

MementoEmbed

Image of a Social Card

MementoEmbed is a tool to create archive-aware embeddable surrogates for archived web pages (mementos), like the social card above. MementoEmbed is different from other surrogate-generation systems in that it provides access to archive-specific information, such as the original domain of the URI-M, its memento-datetime, and to which collection a memento belongs.

MementoEmbed can also create browser thumbnails like the one below.

Image of a Browser Thumbnail

In addition, MementoEmbed can create imagereels, animated GIFs of the best five images from the memento, as seen below.

Image of an Imagereel

For more information on this application, please visit our Documentation Page and read the original blog post describing the reasons behind MementoEmbed.

Installation and Execution

MementoEmbed relies on Redis for caching. Install Redis first and then follow the directions for your applicable Linux/Unix system below.

Installing on a CentOS 8 System

If you would like to use the RPM installer for RHEL 8 and CentOS 8 systems:

  1. download the RPM and save it to the Linux server (e.g., MementoEmbed-0.20211106041644-1.el8.x86_64.rpm)
  2. type dnf install MementoEmbed-0.20211106041644-1.el8.x86_64.rpm
  3. type systemctl start mementoembed.service

If the service does not work at first, you may need to run systemctl start redis.

To remove MementoEmbed, type dnf remove MementoEmbed (it is case sensitive). The uninstall process will create a tarball of the /opt/mementoembed/var directory. This contains the thumbnail cache, imagereel cache, and logs. It is left in case the system administrator needs this data.

MementoEmbed can now be accessed from http://localhost:5550/.

Installing on an Ubuntu 21.04+ System

If you would like to use the deb installer for RHEL 8 and CentOS 8 systems:

  1. download the DEB and save it to the Linux server (e.g., MementoEmbed-0.20211112212747.deb)
  2. type apt-get update <-- this may not be necessary, but is needed in some cases to make sure dependencies are loaded
  3. type apt-get install ./MementoEmbed-0.20211112212747.deb <-- the ./ is important, do not leave it off
  4. type systemctl start mementoembed.service

If the service does not work at first, you may need to run systemctl start redis.

To remove MementoEmbed, type apt-get remove mementoembed (it is case sensitive). The uninstall process will create a tarball of the /opt/mementoembed/var directory. This contains the thumbnail cache, imagereel cache, and logs. It is left in case the system administrator needs this data.

Headless Chromium has a problem on Ubuntu. The issue is known to Google. This may manifest in a log with a message such as ERROR:gpu_init.cc(441) Passthrough is not supported, GL is disabled. MementoEmbed still appears to generate thumbnails, so we are waiting for Google to address the issue.

MementoEmbed can now be accessed from http://localhost:5550/.

Installing on a generic Unix System

If you would like to use the generic installer for Unix (including macOS):

  1. download the generic installer (e.g., install-mementoembed-0.20211112212747.sh)
  2. type sudo ./install-mementoembed-0.20211112212747.sh
  3. start MementoEmbed using either systemctl start mementoembed.service (if your Unix/Linux supports systemd) or /opt/mementoembed/start-mementoembed.sh if not

MementoEmbed can now be accessed from http://localhost:5550/.

Installing and Running the Latest Docker Build

To run the latest Docker build use the following commands.

$ docker pull oduwsdl/mementoembed
$ docker run -d -p 5550:5550 oduwsdl/mementoembed

MementoEmbed can now be accessed from http://localhost:5550/.

Installing and Running From Source Using Docker

Download the code and build an image as following:

$ git clone https://github.com/oduwsdl/MementoEmbed.git
$ cd MementoEmbed
$ docker build -t mementoembed .

Then run a container from this image:

$ docker run -it --rm -p 5550:5550 mementoembed

Flags -it and --rm will make the container connect to the host TTY in interactive mode and remove the container once the process is killed or terminated. To run the container in detached mode, run the following command instead:

$ docker run -d -p 5550:5550 mementoembed

In either case, the application should be accessible at http://localhost:5550/.

Installing and Running Locally From Source With PIP

Download the code and install it within your Python environment.

$ git clone https://github.com/oduwsdl/MementoEmbed.git
$ cd MementoEmbed
$ pip install .

Then set it up to run locally using Flask.

$ export FLASK_APP=mementoembed
$ flask run

Loading a Desired Configuration

The configuration options for MementoEmbed are documented in sample_appconfig.cfg.

The defaults are stored in config/default.py.

To use your own configuration file, copy sample_appconfig.cfg, make modifications, and place it in /etc/mementoembed.cfg. Then run the application locally as described above.

To use your own configuration file stored at /path/to/my/config.cfg with a Docker image, use the -v Docker option: docker run -it --rm -v /path/to/my/config.cfg:/etc/mementoembed.cfg -p 5550:5550 oduwsdl/mementoembed

Directory Layout

The following directory structure exists for organizing MementoEmbed:

  • /config/ - default Flask configuration for MementoEmbed
  • /docs/ - source for documentation of MementoEmbed, products can be viewed at the project Documentation Page.
  • /githooks/ - hooks for use with Git in development (was an experiment, not currently used)
  • /mementoembed/ - main MementoEmbed application
  • /mementoembed/services/ - code containing source code for the machine-accessible MementoEmbed endpoints
  • /mementoembed/static/ - JavaScript and CSS used for the MementoEmbed application
  • /mementoembed/templates/ - Jinja2 templates for the MementoEmbed application
  • /mementoembed/ui/ - code for the user interface MementoEmbed endpoints
  • /tests/unit - automated unit tests for core MementoEmbed functionality
  • /tests/integration - automated integration tests to run against a running MementoEmbed container
  • .dockerignore - used to indicate which files Docker should ignore when building an image
  • .gitignore - used to indicate which files Git should not commit during development
  • .travis.yml - configuration for executing unit tests and testing build of MementoEmbed
  • CONTRIBUTING.md - instructions for contributing to this project
  • Dockerfile - used to build the docker image
  • LICENSE - the license for this project
  • MANIFEST.in - used to ensure additional files are installed on the system when pip is run
  • Makefile - used to build install packages
  • README.md - this file
  • dockerstart.sh - the script run by Docker to start MementoEmbed once a container is started
  • mementoembed-install-script.sh - script included in the generic Unix install package
  • mementembed.control - DEB installer information file
  • mementoembed.postinst - DEB installer post-install script
  • mementoembed.postun - DEB installer post-uninstall script
  • mementoembed.presint - DEB installer pre-install script
  • mementoembed.spec - RPM installer configuration file
  • package-lock.json - pakcage version information used by npm for thumbnail generation
  • raiseversion.sh - a script run to raise the version of MementoEmbed in both documentation and source code
  • release.sh - script planned for use when releasing MementoEmbed (not currently used, may be removed at some point)
  • requirements.txt - listing of requirements used in the Docker container's Python environment
  • sample_appconfig.cfg - MementoEmbed configuration used by the Docker container
  • setup.py - standard Python installation configuration file
  • tagversion.sh - a script run to raise the version of MementoEmbed and tag it for release
  • template_appconfig.cfg - a template of a MementoEmbed configuration used by the generic Unix, DEB, and RPM installers

Run unit tests

The unit tests are designed to be easily run from the setup.py file.

$ pip install .
$ python ./setup.py test

Run integration tests

With a fully operational MementoEmbed, integration tests are possible.

python -m unittest discover -s tests/integration

Integration tests, by default, assume that the instance to be tested is running at port 5550. This can be altered with the TESTPORT environment variable, like so: export TESTPORT=9000.

Integration tests are heavily dependent on environmental factors such as the current state of web archive playback systems. The favicon detection appears to be especially unpredictable. Because of this, we recommend that integration tests be reviewed by humans and not executed automatically on build.

Run CentOS 8 test environment

$ docker build --rm -t local/c8-systemd -f tests/installer/centos8/centos8-systemd-Dockerfile .
$ docker run --privileged -v /sys/fs/cgroup:/sys/fs/cgroup:ro -d -p 5550:5550 local/c8-systemd

From here use common docker commands (e.g., docker cp, docker exec) to interact with the container.

Run Ubuntu 21.04 test environment

$ docker build --rm -t local/u2104-systemd -f tests/installer/ubuntu2104/ubuntu2104-systemd-Dockerfile .
$ docker run --privileged -v /sys/fs/cgroup:/sys/fs/cgroup:ro -d -p 5550:5550 local/u2104-systemd

From here use common docker commands (e.g., docker cp, docker exec) to interact with the container.

Contributing

Please consult the Contribution Guidelines in CONTRIBUTING.md for submitting bug reports, pull requests, etc.

mementoembed's People

Contributors

himarshaj avatar ibnesayeed avatar machawk1 avatar shawnmjones avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

databill86 min2ha

mementoembed's Issues

FutureWarning for the way a None object is checked

After booting the server when the first card creation request arrives, the server logs a warning.

/app/mementoembed/mementosurrogate.py:785: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  if maxpara:

Implement thumbnail generation

A feature of oEmbed is the generation of thumbnails based on given height/width requirements. MementoEmbed needs to support this.

Improve the error message for non-mementos

Right now, the following error message appears on a red background:

The URL you supplied (https://www.flexispy.com/) is not a memento or comes from an archive that is not Memento-Compliant.

For a live web resource, you can create a memento that resides on the web in the following ways:

* Using the Internet Archive's Save Page Now button.
* Saving the web page at Archive.is
* Using the ArchiveNow service.
* Using a browser plugin, like Mink.

Happy Memento Making!

The red background is not friendly and there should be links to the recommended services.

Add line ending to all files

Many files in this repo currently do not have a trailing newline character which is not an standard practice.

First attempt to create a card fails

Run the server and load the page http://localhost:5550/ then enter a URI-M and click "Create a Social Card" button. For the first time the card creation request is not processed, instead the page is redirected to http://localhost:5550/?# (notice the added ?# part in the URL), the form is cleared, and a message is logged in the console, saying we have failure with data: [object Object]. Any further attempts will be processed as expected (unless the ?# is removed from the URL again.)

Use unobtrusive JavaScript for event listeners

It's generally not considered a good practice to place JS as HTML attribute value in an obtrusive way so that when all the JS is extracted out into an external file, it does not leak into the markup. The click handler (such as onclick="requestEmbed();") from the following code can be removed and can be bound externally in an unobtrusive way.

<button type="button" class="btn btn-primary" onclick="requestEmbed();">Create a Social Card</button>
<button type="reset" class="btn btn-secondary" onclick="clearEmbed();">Clear URL</button>

Support for CarbonDate service

MementoEmbed should at least link to the CarbonDate service for a given URI. Perhaps the CarbonDate endpoint could be made configurable?

Test with URI-Ms from Bibliotheca Alexandrina Web Archive

This requirement will be delayed until we have URI-Ms to test.

"Please note that the Bibliotheca Alexandrina is currently migrating the web archive collection to a new storage system. Therefore, availability of archived webpages is not guaranteed for the time being. We appreciate your patience, and we look forward to the collection being fully available once again soon."

Fix issue with #? in URI

It appears that MementoEmbed does not execute properly at first load. If a user submits a URI-M at /, the system reloads the page to /#? rather than submitting the request. Once this reload has occurred, subsequent URI-M submissions are successful.

Time zone missing from card date

On the bottom of the card, there is a datetime and source but the time is ambiguous, as it lacks a timezone. "GMT", "Z", or what ever is applicable ought to be appended.

MementoEmbed 6563c01 using Docker.

screen shot 2018-06-19 at 5 19 36 pm

ResourceWarning: unclosed ssl.SSLSocket

Is there anything we can do about this?

ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.86.26', 50357), raddr=('207.241.225.186', 443)>

This only shows up during unit tests and appears to be related to:
psf/requests#3912

Dockerfile improvements

FROM python:3.6.4-stretch

The base image is so specific that it would not update to even any non-breaking security patches (i.e., third point release in the semver scheme) automatically. For example, python:3.6.4-stretch is already stale and python:3.6.5-stretch is published. We should perhaps use python:3.6-stretch instead to accommodate any security patch releases. Better yet, if Python 3.6 is not required specifically and any later version would work then we can use python:3-stretch instead.

# TODO: publish archiveit_utilities so that we don't need to do this
RUN git clone https://github.com/shawnmjones/archiveit_utilities.git

RUN cd archiveit_utilities && pip install .

Each docker instruction adds one layer to the image. It is better to put relevant tasks and any associated cleanup should in a single layer for a cleaner image. Hence, above instructions can be consolidated in a single one.

Support for MementoDamage service

MementoEmbed should at least link to the MementoDamage service for a given URI. Perhaps the MementoDamage endpoint could be made configurable?

Force cache hits during automated testing

In spite of using a custom heuristic with cachecontrol, the system still skips the cache for some requests. Because web archives will likely block too many requests for the same URI-M, this will need to be alleviated for automated testing to be used with Travis CI.

Ensure that this project works with archive.is

This project has issues with resolving content from archive.is URI-Ms. A potential solution may exist in the use of the ZIP URI containing the content that is rendered within the archive.is banner.

Missing thumbnail image in cards

Every time I try to request a card for a URI-M like https://web.archive.org/web/20180604110141/http://www.example.com/, the card is missing the thumbnail image and a 404 is reported in the developer console of the browser which points to a resource at http://localhost:5550/undefined. This might be a duplicate of #30.

Add a LICENSE file

We usually use MIT license for most of our codes, but you can chose whichever feels more suitable to you.

Include images from loaded CSS files

Sometimes all images on a page are loaded via CSS. MementoEmbed should interrogate the CSS files for URIs found in the background-image property.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.