Code Monkey home page Code Monkey logo

tmvis's Introduction

Timemap Visualization (TMVis)

TMVis is a web service to provide visualizations of how individual webpages have changed over time. We offer four visualizations: Image Grid, Image Slider, Timeline, and Animated GIF.

See https://www.cs.odu.edu/~mweigle/tmvis-embed/ for examples of embedding the Image Grid, Image Slider, and Animated GIF in a webpage.

For more details, including a system walk-through and demo video, see Visualizing Webpage Changes Over Time With TMVis from May 2020 on the ODU WS-DL Blog.

This work has been supported by a NEH/IMLS Digital Humanities Advancement Grant (HAA-256368-17). We are grateful for the support of NEH and IMLS, and for the input from our partners, Deborah Kempe and Sumitra Duncan from the Frick Art Reference Library and New York Art Resources Consortium and Pamela Graham and Alex Thurman from Columbia University Libraries. This project is an extension of AlSum and Nelson’s ECIR 2014 work, “Thumbnail Summarization Techniques for Web Archives”, and Mat Kelly's (@machawk1) ArchiveThumbnails project, funded by an incentive grant from Columbia University Libraries and the Andrew W. Mellon Foundation.

Please cite this project as indicated in the Citing Project section.

Running as a Docker Container

Running the server in a Docker container can make the process of dependency management easier. This document assumes that you have Docker setup already, if not then follow the official guide.

A Docker image from the current version of the code is built and published on Docker Hub at oduwsdl/tmvis. Pull the image and run the service in a container as shown below:

$ docker pull oduwsdl/tmvis
$ docker run --shm-size=1G -it --rm -p 3000:3000 oduwsdl/tmvis

Alternatively, a custom Docker image can be built from the source. In order to do this, clone the repository and change the working directory then build and run the image as the following:

$ git clone https://github.com/oduwsdl/tmvis.git
$ cd tmvis
$ docker image build -t timemapvis .
$ docker run --shm-size=1G -it --rm -p 3000:3000 timemapvis

In the above command the container is running and can be accessed from outside on port 3000 at http://localhost:3000/. If you want to run the service on a different port, say 80 then change -p 3000:3000 to -p 80:3000.

In order to persist generated thumbnails, mount a host directory as a volume inside the container by adding -v /SOME/HOST/DIRECTORY:/app/assets/screenshots flag when running the container.

Container is completely transparent from the outside and it will be accessed as if the service is running in the host machine itself.

In case if you want to make changes in the tmvis code itself, you might want to run it in the development mode by mounting the code from the host machine inside the container so that changes are reflected immediately, without requiring an image rebuild. Use the following command to mount the code from the host:

$ docker run --shm-size=1G -it --rm -v "$PWD":/app -v /app/node_modules -p 3000:3000 timemapvis

Running Locally

Node.js is required to run the service. In order to run this program locally, clone the repository and change the working directory then install dependencies and run the server file as the following:

$ git clone https://github.com/oduwsdl/tmvis.git
$ cd tmvis
$ npm install -g
$ node tmvis.js

Usage of the service

Running this service provides a user with the array of JSON object as the response (webservice model), which then has to be visualized with the UI tool deployed at http://tmvis.cs.odu.edu/ for which the code is available at https://github.com/oduwsdl/tmvis/ under the public folder.

Supported server arguments are as follows:

  • host is used to specify server's hostname, default localhost
  • port is used to specify server's port, default 3000
  • proxy is used to specify proxy, generated using host and port if not provided
  • debug is used to add logs to server's console, useful for development
  • ssd is used to specify the amount of time to wait before taking a screenshot of a memento in seconds, default 2
  • oes overrides the usage of cache files, default false
  • os computes both simhash and hamming distance, default true
  • maxMementos is used to specify the maximum number of mementos to be analyzed, default 1000

To query the server instance generated using your browser visit http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/[histogram | stats | summary]/[from]/[to]/http://4genderjustice.org/, which has the attributes path as primesource/ci/role/from/to/URI-R substitute the URI-R to request a different site's summarization.

Parameter Definitions:

  • primesource gets the value of 'archiveit', 'arquivopt', or 'internetarchive' as to let the service know which is the primary source.
  • ci is used to specify the collection identifier if not specified the argument 'all' is used
  • role: This is used to specify the values 'histogram', 'stats' or 'summary'.
    • histogram: to get dates and times of a timemap in the specified date range.
    • stats: for getting the no of unique mementos.
    • summary: to get the the unique mementos along with the screenshots captured.
  • from and to: These parameters are used to specify the date range of the timemap to be loaded (/0/0/ for full timemap or /YYYY-MM-DD/YYYY-MM-DD format for a specific date range).

Example URIs

Full timemaps

  • http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/histogram/0/0/http://4genderjustice.org/
  • http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/stats/0/0/http://4genderjustice.org/
  • http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/summary/0/0/http://4genderjustice.org/

Date range (Format: YYYY-MM-DD)

  • http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/stats/2016-08-01/2017-07-23/http://4genderjustice.org/
  • http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/summary/2016-08-01/2017-07-23/http://4genderjustice.org/

Request format (Role -> histogram)

curl -il http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/histogram/0/0/http://4genderjustice.org/

Mapping of attributes of URI to the values are as follows:
  primesource -> archiveIt
  collection Identifier -> 1068
  hammingdistance -> 4
  role -> histogram
  from date -> 0
  to date -> 0
  URI-R under request -> http://4genderjustice.org/

Response format

[
  "Jul 01, 2015 21:56:41",
  "Jul 01, 2015 22:32:40",
  "Oct 01, 2015 21:17:52",
  ....
  "Oct 24, 2019 00:52:02",
  "Jan 23, 2020 23:10:05",
  "Jan 23, 2020 23:10:25"
]

Request format (Role -> stats)

curl -il http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/stats/0/0/http://4genderjustice.org/

Mapping of attributes of URI to the values are as follows:
  primesource -> archiveIt
  collection Identifier -> 1068
  hammingdistance -> 4
  role -> stats
  from date -> 0
  to date -> 0
  URI-R under request -> http://4genderjustice.org/

Response format

[
  {
    "threshold":2,
    "totalmementos":42,
    "unique":9,
    "timetowait":5,
    "fromdate":"Wed, 01 Jul 2015 21:56:41 GMT",
    "todate":"Wed, 24 Jul 2019 02:42:08 GMT"
  },
  ....
  {
    "threshold":12,
    "totalmementos":42,
    "unique":2,
    "timetowait":2,
    "fromdate":"Wed, 01 Jul 2015 21:56:41 GMT",
    "todate":"Wed, 24 Jul 2019 02:42:08 GMT"
  }
]

Request format (Role -> summary)

curl -il http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/summary/0/0/http://4genderjustice.org/

Mapping of attributes of URI to the values are as follows:
  primesource -> archiveIt
  collection Identifier -> 1068
  hammingdistance -> 4
  role -> summary
  from date -> 0
  to date -> 0
  URI-R under request -> http://4genderjustice.org/

Response format

[
  {
    "timestamp": 1435787801,
    "event_series": "Thumbnails",
    "event_html": 'http://localhost:3000/static/timemapSum_httpwaybackarchiveitorg106820150701215641http4genderjusticeorg.png',
    "event_date": "Aug. 01, 2015",
    "event_display_date": "2015-07-01, 21:56:41",
    "event_description": "",
    "event_link": "http://wayback.archive-it.org/1068/20150701215641/http://4genderjustice.org/"
  },
  {
    "timestamp": 1435789960,
    "event_series": "Non-Thumbnail Mementos",
    "event_html": 'http://localhost:3000/static/notcaptured.png',
    "event_html_similarto": 'http://localhost:3000/static/timemapSum_httpwaybackarchiveitorg106820150701215641http4genderjusticeorg.png',
    "event_date": "Aug. 01, 2015",
    "event_display_date": "2015-07-01, 22:32:40",
    "event_description": "",
    "event_link": "http://wayback.archive-it.org/1068/20150701223240/http://4genderjustice.org/"
  },....
]

Request format (Role -> stats) (Date range)

curl -il http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/stats/2016-08-01/2017-07-23/http://4genderjustice.org/

Mapping of attributes of URI to the values are as follows:
  primesource -> internetarchive
  collection Identifier -> all
  hammingdistance -> 4
  role -> stats
  from date -> 2016-08-01
  to date -> 2017-07-23
  URI-R under request -> http://4genderjustice.org/

Response format

[
  {
    "threshold":2,
    "totalmementos":91,
    "unique":6,
    "timetowait":4,
    "fromdate":"Tue, 02 Aug 2016 16:39:55 GMT",
    "todate":"Sat, 22 Jul 2017 06:49:56 GMT"
  },
  ....
  {
    "threshold":7,
    "totalmementos":91,
    "unique":1,
    "timetowait":1,
    "fromdate":"Tue, 02 Aug 2016 16:39:55 GMT",
    "todate":"Sat, 22 Jul 2017 06:49:56 GMT"
  }
]

Request format (Role -> summary) (Date range)

curl -il http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/summary/2016-08-01/2017-07-23/http://4genderjustice.org/

Mapping of attributes of URI to the values are as follows:
  primesource -> internetarchive
  collection Identifier -> all
  hammingdistance -> 4
  role -> summary
  from date -> 2016-08-01
  to date -> 2017-07-23
  URI-R under request -> http://4genderjustice.org/

Response format

[
  {
    "timestamp":1470155995,
    "event_series":"Thumbnails",
    "event_html":"http://localhost:3000/static/timemapSum_httpwebarchiveorgweb20160802163955http4genderjusticeorg.png",
    "event_date":"Aug. 02, 2016",
    "event_display_date":"2016-08-02, 16:39:55",
    "event_description":"",
    "event_link":"http://web.archive.org/web/20160802163955/http://4genderjustice.org/"
  },
  ....
  {
    "timestamp":1500706196,
    "event_series":"Non-Thumbnail Mementos",
    "event_html":"notcaptured",
    "event_html_similarto":"http://localhost:3000/static/timemapSum_httpwebarchiveorgweb20170714114554http4genderjusticeorg.png",
    "event_date":"Jul. 22, 2017",
    "event_display_date":"2017-07-22, 06:49:56",
    "event_description":"",
    "event_link":"http://web.archive.org/web/20170722064956/http://4genderjustice.org/"
  }
]

Citing Project

A tech report related to this project is available in arXiv.org (pdf). Please cite it as below:

Abigail Mabe, Dhruv Patel, Maheedhar Gunnam, Surbhi Shankar, Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle. Visualizing Webpage Changes Over Time. Technical report arXiv:2006.02487, June 2020.

@techreport{tmvis-arxiv-2020,
  author    = {Abigail Mabe and Dhruv Patel and Maheedhar Gunnam and Surbhi Shankar and Mat Kelly and Sawood Alam and Michael L. Nelson and Michele C. Weigle},
  title     = {Visualizing Webpage Changes Over Time},
  year      = {2020},
  month     = jun,
  number =  {arXiv:2006.02487},
  url       = {https://arxiv.org/abs/2006.02487}
}

Regarding License

Though GPL Licensing was used for base (https://github.com/machawk1/ArchiveThumbnails) of this repository, but for this current one MIT license is in place and is changed with the permission from the original author @machawk1.

tmvis's People

Contributors

a-mabe avatar dhruv282 avatar ibnesayeed avatar machawk1 avatar mgunn001 avatar miranda-c-smith avatar weiglemc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tmvis's Issues

Difficulties in hovering over histogram values

After entering an URI in the service and hitting Calculate, a histogram is displayed:

Screen Shot 2019-08-28 at 2 39 59 PM

Hovering over a bar show the number of mementos available for the time slice, where each bar represents a month.

One must be very precise to move the mouse a certain number of pixels to hover over the bar. This is quite inaccessible. Hovering "above" the bar does not display the value, as done on IA's Wayback Machine toolbar and interface. Very small values make the "hotspot" to activate the hover as small as a single pixel or two.

Please change this effect to activate on hovering within the histogram anywhere in the vertical space for the respective bar.

What is "CONTAINER ID CREATED ABOVE"?

The instructions for running this using Docker reference what I assume is a placeholder stating, "(CONTAINER ID CREATED ABOVE)". From where does a user get this value? The README is unclear on this.

Not working on Windows 7

OS: Windows 7 Enterprise SP 1 (CS conference room computer)

Firefox ESR 38.5.2
After clicking "Calculate # of Thumbnails", there's a small popup near the back button that says "Please fill out this field" and nothing further happens.

Chrome 48.0.2564.103m
After clicking "Calculate # of Thumbnails", nothing happens.

Relative non-URIs path literals produce error

Testing the service at http://tmvis.cs.odu.edu

I enter ../../../../ as the text input. Hitting calculate then shows a blank page with the error: Cannot GET /alsummarizedview/

../../../ displays Cannot GET /alsummarizedview/internetarchive/.

../../ displays Cannot GET /alsummarizedview/internetarchive/all/

etc.

This error should probably keep the user within the interface and report that the "URI" is invalid as input instead of showing a blank page with the above respective message.

Utilize HEAD requests to the service

The HTTP HEAD verb (c.f. GET, POST, etc) is useful both for testing as well as for users to get brief information without needing to wait for a potentially long process to finish. In my local instance when I

curl -I http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/summary/http://4genderjustice.org/

I am told "METHOD NOT ALLOWED". If you are caching the results at all, expressing the state of the processing could be done via an HTTP response header from a HEAD request (with the same information potentially also being present in the GET).

Screenshots directory has to be manually created on initial run

I pulled the latest master (b612de9) and followed the instructions in the README to build on Windows 10 (in an attempt to replicate #52).

On running node tmvis.js I am told Error: ENOENT: no such file or directory, mkdir 'C:\Users\Mat Kelly\tmvis\assets\screenshots'. Manually creating this directory (there isn't a assets directory in the repo) allows the aforementioned command to run and the service to be started.

Back page browser button not refreshing

After loading up the summary page, clicking the browser back button updates the url to stats but does not refresh the page. The user has to refresh it manually.

README phrasing

In the Docker instructions, the section title states, "Recommended for naive users". The phrasing of this may be construed as condescending.

Embedded resources not displaying on demo site due to cross-scheme requests

The service at https://tmvis.cs.odu.edu/ uses HTTPS but requests resources (3 CSS files) over HTTP. This causes a "strict-origin-when-cross-origin" error per the Referrer Policy when the page is loaded.

The error reported per the Chrome console:
tmvis

How the web page currently displays:
vizdisplay

While these URIs could simply be fixed to use HTTPS, what are the barriers of simply pulling in local copies to the web application thus incurring fewer round trips to remote servers?

Chrome 98, macOS 10.14.6.

TimeMap Vis Locks up sometimes

Environment:
Web browser: Safari Version 11.1 (13605.1.33.1.2)
OS: macOS High Sierra 10.13.4 (17E199)

I submitted the URI-R www.cs.odu.edu and selected 15 unique thumbnails. TimeMap Vis mentioned that it would take some time, so I stepped away for a few minutes. When I returned, the screen was stuck like this:

screen shot 2018-04-09 at 3 50 25 pm

I cannot interact with this screen. Note that Safari appears to have crashed with an error of "This webpage was reloaded because a problem occurred".

I opened a new tab, tried www.cs.odu.edu/~mln/ and got the same result, even though this URI-R had worked successfully earlier before:

screen shot 2018-04-09 at 3 56 57 pm

With that attempt, Safari does not display the error message.

A further attempt with www.shawnmjones.org worked fine.

Confusing UI: What do the "2" and "1" buttons mean?

One particular part of the UI is confusing.

  1. I have entered the URI-M https://wayback.archive-it.org/2950/20120110014403/http://OccupyHouston.org/es and clicked "Calculate # of Thumbnails"
  2. I then clicked "Calculate Unique"
  3. I am confronted with instructions that state "The options shown below are based on the amount of difference between the mementos." and am presented with 3 buttons as shown in the screenshot: "2", "1", and "Generate Thumbnails".

The instructions in step number 3 imply that "2" and "1" provide options. After some consideration, I assume that they are paginating something, but I've clicked on both and see no change. Clicking "1" changes the message beneath the buttons to "(estimated 1 minutes to generate)". Clicking "2" changes the message beneath to "(estimated 2 minutes to generate)". What do clicking them change? Do these buttons alter the amount of time dedicated to generating thumbnails?

My suggestion is to be more explicit in the instructions about what is expected of the user.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.